Table of Contents

CzechIdM database maintenance

CzechIdM's database backend holds data that are critical to identity manager functionality and performace. As any part of the deployment, the RDBMS needs regular maintenance so its performance will not degrade. This HOWTO is primarily aimed at PostgreSQL as that is a database used on most of our projects. However, some parts of it are also true for the MSSQL.

Monitoring database performance

To be able to determine possible performance issues, having real-time monitoring in place is the key. To setup basic monitoring, see this howto. Your monitoring setup should be able to graph measured values. This is invaluable when you have to localize problem's timeline and trends or to make assumptions about database growth (e.g. to add disk space).

OS-level monitoring

On the OS level, you should monitor those things. Preferred way is to use external monitoring system (Nagios/Centreon/Icinga, Munin, Zabbix, …).

Database-level monitoring

Tuning the database

From the previous section, it should be clear that proper maintenance of the database is difficult and never ending task. And yet, we barely scratched the surface of this topic.

We already mentioned most implications of high resource utilization and we will not repeat it there. We will discuss a few handy tips instead.

Defragmenting database files on the filesystem

On Linux, you usually do not need this. But in case you ever need to, the VACUUM FULL ANALYZE table_name on the particular set of tables is a way to go. This command causes PostgreSQL to copy all data in the table to a new datafile (DBF), and swap the old and new files on-the-fly. Database engine holds an exclusive lock on a table during VACUUM FULL and the table is effectively inaccessible to sessions.

On Windows, you should periodically check the DBF fragmentation to make sure it stays in reasonable levels. There should be a standard Windows' Defragmentation task scheduled to run periodically. The default (once a week) should be enough. In case the DBF fragmentation is high and standard defrag.exe does not seem to help, perform the VACUUM FULL ANALYZE table_name on the table that corresponds to the fragmented DBF.

It is possible to perform similar set of operations using pg\_repack from the pgRepack project. Note: Although it offers similar functionality to VACUUM FULL, we currently do not have sufficient data on pg_repack's ability to reduce FS fragmentation of the DBF.

Reducing bloat

Each database implements a multi-version concurrency control (MVCC). If it didn't it could not be able to isolate database transactions. In a nutshell, MVCC creates a versioned snapshot of each changed row for "each" transaction that is already in process. Once the transaction ends, MVCC algorithm decides which version of the row is the correct result of the transaction and this row becomes "official version" of the row that is visible to all transactions from now on.

Row versions in other database engines are stored as UNDO segments (Oracle) or TEMPDB (MSSQL). When the transaction ends, obsolete row versions are cleaned-up. Latest row versions are always stored in the table itself and old row versions are stored in the UNDO/TEMPDB. This has interesting consequences:

PostgreSQL stores row versions directly in database table. When the transaction ends, obsolete rows are not cleaned in any way. There is, however, a garbage-collecting thread called AUTOVACUUM which goes through all tables in the database and removes obsolete rows, reclaiming free space in the DBF. Consequences:

Although the bloat should be reclaimed automatically, it may not happen frequently enough. In case you have to do a maintenance, run VACUUM ANALYZE table_name on the bloated table. This command reclaims free space from the tables (and corresponding DBF) and shrinks the DBF a bit - the space between high water mark and the physical end of the datafile is reclaimed and returned to OS. Plain VACUUM can be run without affecting database clients.

To monitor bloat ration on your tables and indexes, you can use this project.

If you want to reclaim all free space from DBF to OS, you have to use VACUUM FULL (which locks the table) or use additional module like pgRepack (which can do it without extensive locking).

Optimizing AUTOVACUUM

Although PostgreSQL does a good job in self maintenance, some tables may remain bloated indefinitely. This is because of AUTOVACUUM settings.

PostgreSQL has plenty of parameters to tune the autovacuum. You can set them either globally in the configuration file or per table using ALTER TABLE statement.

Autovacuum is ran on such called "candidate tables".

The interesting thing about those formulas is that they depend on total number of rows in a table. This means that the bigger the table is, the more changes have to happen on it before it is considered a candidate table. As a result, big tables (even with heavy activity) tend to be vacuumed less frequently than small tables. Those big tables tend to accumulate bloat more and more over time.

Autovacuum runs periodically and waits autovacuum\_naptime between its executions. There are autovacuum\_max\_workers threads that perform the vacuum operation. There can be only one worker processing a particular table and autovacuum\_max\_workers are defined globally per database instance. This means that all databases in the RDBMS share a pool of those worker threads.

Moreover, because vacuum can be I/O and CPU-hungry, there is a global limit on vacuum cost autovacuum\_vacuum\_cost\_limit that is divided between autovacuum worker threads. When the autovacuum worker hits the limit, it will sleep for a autovacuum\_cost\_delay milliseconds. Vacuuming a huge table can then take multiple runs of the vacuum thread. It the table is frequently updated and the autovacuum does not keep up, this results in bloat accumulation.

Recommendations for AUTOVACUUM tuning:

Rebuilding indexes

When using btree indexes (which IdM does a lot), indexes can become bloated. Bloated index has many dead leaves that are worth reclaiming. They are bigger and thus harder to manage and traverse. Such index does not have its tree-like structure and is partly degenerated to a linked list. This worsens performance from theoretical *O(log(n))* to *O(n)* in some cases. Practical effect is, the index lookup is slower when using bloated index.

It is worth noting that having index that is "big" in terms of size is generally not a problem. Depending on the database, having index be bigger than the indexed table itself may even be a feature. The point of having index is to make lookups fast and to minimize locking. Having tightly-packed index is partially undesirable because most of the time, you insert new leaves somewhere into the middle of the btree. If the index has no spare space it has to be either rebalanced or skewed. In the long run, this is a performance hit waiting to happen.

From time to time, the database can benefit from index rebuild which removes the bloat.

REINDEX INDEX index_name; -- to recreate specific index
REINDEX TABLE TABLE_NAME; -- to recreate all indexes on the table
REINDEX DATABASE database_name; -- to recreate all indexes in a database
-- and other variants

The REINDEX command needs to lock all write operations on the indexed table, so it usually conflicts with normal database operations.

Since PostgreSQL 12, the REINDEX … CONCURRENTLY can be used. This command performs online rebuild of the index without blocking write operations on the table. See official documentation for details.

Tuning antivirus software

Having antivirus software on a database machine can be a performance killer. Antivirus software hooks itself to the OS kernel and basically inspects all data going through syscalls - network data, disk accesses, etc. For database which is I/O expensive appliance, this may lower the performance by up to 80% (we have actually seen this happen on Windows Server 2012R2 with a Symantec Endpoint Protection).

Symptoms of having this problem generally are:

If you are absolutely positive you need to run an antivirus software on the database machine, configure exceptions where appropriate.