CzechIdM database maintenance

CzechIdM's database backend holds data that are critical to identity manager functionality and performace. As any part of the deployment, the RDBMS needs regular maintenance so its performance will not degrade. This HOWTO is primarily aimed at PostgreSQL as that is a database used on most of our projects. However, some parts of it are also true for the MSSQL.

To be able to determine possible performance issues, having real-time monitoring in place is the key. To setup basic monitoring, see this howto. Your monitoring setup should be able to graph measured values. This is invaluable when you have to localize problem's timeline and trends or to make assumptions about database growth (e.g. to add disk space).

On the OS level, you should monitor those things. Preferred way is to use external monitoring system (Nagios/Centreon/Icinga, Munin, Zabbix, …).

CPU utilization by the RDBMS.
- High CPU utilization may be caused by suboptimal configuration of the database engine. Each database client connection has few kilo- or megabytes of memory allocated to itself. In this memory, sorting operations are performed. If you execute SQL query such as SELECT … FROM … ORDER BY …, the ordering is performed in the memory area of client connection that ran the query. Having high CPU consumption may be a symptom of this area being too small - in that case, RDBMS usually performs sorting partially on the disk, which also leads to higher I/O load.
- High CPU utilization might mean undersized database instance.
- High CPU utilization may be caused by stuck or badly written SQL queries.
- High CPU utilization may be caused by stuck (AUTO)VACUUM.
- Monitoring CPU utilization:
  - Linux: top and its variants, sysstat package for long-term monitoring.
  - Windows: Performance Monitor
RAM utilization.
- High RAM utilization may or may not be a problem, depending on the intended RDBMS instance sizing. As a rule of thumb, the database should be sized to about
  - DB instance = total RAM - 1GB - (1MB * number of connections) for Linux and
  - DB instance = total RAM - 2GB - (2MB * number of connections) for Windows hosts.
  - To make sure particular configuration parameters are set correctly, it is better to consult a calculator.
- Monitoring RAM utilization:
  - Linux: free, top and its variants, vmstat, sysstat package for long-term monitoring.
  - Windows: Performance Monitor
Swap space utilization.
- The system should not need to use the swap partition. Some systems and applications do it less, some do it more. As long as the swap utilization stays in, say, tens of megabytes, this is not a problem at all.
- High swap utilization means that the server probably does not have enough RAM and uses also a disk storage. This harms performance because accesses to RAM (~5ns) become accesses to disk (~10ms), which are about 10⁶ times slower.
- Monitoring swap utilization:
  - Linux: free, top and its variants, vmstat, sysstat package for long-term monitoring.
  - Windows: Performance Monitor
I/O utilization.
- I/O utilization heavily depends on your underlying infrastructure - disk technology, local disk vs. SAN, server HW, network cards, etc.
- IdM is read/write heavy during batch job (LRT) execution. In those cases, having high I/O traffic to disk is normal.
- When IdM is not executing LRTs, the I/O traffic to disks should be relatively small and most requests should be served from the RDBMS cache. If you observe high I/O all the time, the database instance sizing should revised and adjusted.
- Having high iowait is usually never good sign. If your iowait exceeds about 5% on average or if there are iowait peaks, revise your storage architecture and database instance sizing.
- Network I/O between IdM and database is usually high. If you do not observe high iowait on the database client (IdM), which could mean network saturation, having high network traffic is considered pretty normal.
- Monitoring I/O utilization:
  - Linux: iotop, iostat, vmstat, sysstat package for long-term monitoring.
  - Windows: Performance Monitor
Database file (DBF) fragmentation on the filesystem.
- When appending to the end of a datafile during INSERT (or UPDATE because for PostgreSQL updates are insert new, delete old operations), the datafile grows on the OS filesystem. Growing datafile means that filesystem may need to allocate new file fragment. On append-heavy tables, this results in DBF fragmentation and, in effect, in bad performace when working with non-cached data. The RDBMS must seek some (or even all if doing sequential table scans) file fragments, which means many many seeks to disk and the performance is sufferring.
- On Linux, this should not be a problem. Widely-used filesystems (ext3, ext4, xfs) reserve some extra headspace for a file. When the file grows, this extra space is used, thus reducing fragments needed.
- On Windows, this can become a problem when left unchecked. NTFS filesystem stucks data together to make one huge continuous block and does not consider that files may grow. Growing DBF means the NTFS allocates new fragment just for the growth that is needed. When the DBF grows again, NTFS allocates another fragment, and so on. This leads to noticeably worse performance of PostgreSQL on Windows machines.
- Monitoring DBF fragmentation:
  - Linux: filefrag from e2fsprogs package.
  - Windows: contig -a PATH using SysInternals Contig.

Sessions statistics.
- Total number of sessions. - During the lifetime of a server, many clients and applications use it, reconfigurations and new versions happen. Gradually, the max\_connections (maximum number of simultaneous sessions) can become too small. It is thus handy to observe how many connections access the database.
- Running (active) sessions. - Running sessions are those sessions where a SQL query is being executed. This statistic is important because you can check how many queries are executed in parallel and find possible bottlenecks in your application (e.g. bad JDBC pooling configuration), database schema (missing index resulting in exclusive lock) or queries (locking more tables/rows than necessary).
- Idle sessions. - Connected client that is not executing any SQL query. Special case of this is IDLE IN TRANSACTION, meaning the client (IdM) is in the middle of a database transaction but currently not executing SQL. Session which is IDLE IN TRANSACTION for long means that the client application is chewing data on its side for a very long time (= possible application problem) before it finally COMMITs or ROLLBACKs the database transaction. Such long-running transactions have overall bad effect on database performance because they tend to hold locks on data. They also worsen table bloating because PostgreSQL has to keep old versions of rows for a long time.
- Blocked (waiting) sessions. - Sessions that are running but waiting for data (because of DB lock or disk I/O). Ideally there should not be any of those. In practice, any query that is blocked for more than few milliseconds (~disk access time) is candidate for a bit of debugging.
- Simple monitoring is possible through pg\_stat\_activity:
```
SELECT pid, -- PID of the process allotted to session
       usename, -- username
       datname, -- database name
       backend_start, -- connect time of session
       state, -- active / idle / idle in transaction / ...
       waiting, -- if the query is waiting for the RDBMS (lock wait, I/O wait, ...)
       state_change, -- timestamp of last state change
       query -- SQL command of a query that is running / was ran last
       -- and many other columns
FROM pg_stat_activity;
```
Slow queries.
- Some queries the IdM performs on the database may be slow due to various reasons (schema problem, bug, high iowait, etc.). We recommend to enable logging of slow queries in the RDBMS, setting the threshold to 500ms (longer queries get logged) and periodically check the log file. There should not be any >500ms queries at all so any query that makes it to the logfile should be revised.
- To turn slow query logging on, add log\_min\_duration\_statement = 500 setting to the postgresql.conf and reload the database engine.
Table and index bloat monitoring.
- Unlike other database engines, PostgreSQL's MVCC model does not use separate UNDO/TEMPDB and such. It stores row versions directly in the table. This means that any UPDATE and DELETE (and possibly INSERT when used only for storing transaction-temporary result) generate bloat. Bloat exists on rows and indexes and consists of old versions of data that were marked for deletion but not yet garbage-collected by AUTOVACUUM. Having high bloat is bad for performance and can be caused by badly-written SQL queries, frequent batch jobs, current AUTOVACCUM configuration, etc.
- For periodic bloat monitoring, you can use for example this project.
Database engine performance.
- When diagnosing a problem, an administrator usually does a quick check of the OS (CPU, RAM, swap, I/O) and RDBMS (CPU consumption, sessions, cache hit rate, running queries). Very useful tool that integrates most of this information is pg\_top.
Database query debugging.
- The internal EXPLAIN command to explain queries.
- The usual tool for database-wide debugging is a pgBadger (older app for this is also pgFouine). This utility parses the RDBMS log and generates a plethora of reports on the performance of queries and the database itself. It requires a database logfile with specifically-set logging options. Such logging is very verbose and verbose logging kills RDBMS performance. Therefore, this utility should be used only when necessary.
- We tested pgBadger on Git Bash on Windows and, since it is almost self-contained PERL script, its basic functionality runs fine.

From the previous section, it should be clear that proper maintenance of the database is difficult and never ending task. And yet, we barely scratched the surface of this topic.

We already mentioned most implications of high resource utilization and we will not repeat it there. We will discuss a few handy tips instead.

On Linux, you usually do not need this. But in case you ever need to, the VACUUM FULL ANALYZE table_name on the particular set of tables is a way to go. This command causes PostgreSQL to copy all data in the table to a new datafile (DBF), and swap the old and new files on-the-fly. Database engine holds an exclusive lock on a table during VACUUM FULL and the table is effectively inaccessible to sessions.

On Windows, you should periodically check the DBF fragmentation to make sure it stays in reasonable levels. There should be a standard Windows' Defragmentation task scheduled to run periodically. The default (once a week) should be enough. In case the DBF fragmentation is high and standard defrag.exe does not seem to help, perform the VACUUM FULL ANALYZE table_name on the table that corresponds to the fragmented DBF.

It is possible to perform similar set of operations using pg\_repack from the pgRepack project. Note: Although it offers similar functionality to VACUUM FULL, we currently do not have sufficient data on pg_repack's ability to reduce FS fragmentation of the DBF.

Each database implements a multi-version concurrency control (MVCC). If it didn't it could not be able to isolate database transactions. In a nutshell, MVCC creates a versioned snapshot of each changed row for "each" transaction that is already in process. Once the transaction ends, MVCC algorithm decides which version of the row is the correct result of the transaction and this row becomes "official version" of the row that is visible to all transactions from now on.

Row versions in other database engines are stored as UNDO segments (Oracle) or TEMPDB (MSSQL). When the transaction ends, obsolete row versions are cleaned-up. Latest row versions are always stored in the table itself and old row versions are stored in the UNDO/TEMPDB. This has interesting consequences:

Transaction COMMIT is very fast because data is already in the table.
Transaction ROLLBACK is slower because data have to be reconstructed from the UNDO and moved to the table.
Because row versioning does happen in completely separated storage, there is no bloat on the table itself. The cost of this is that you have to manage the UNDO storage - having small space or quick retention results in failing queries and well-known errors like ORA-01555: snapshot too old.

PostgreSQL stores row versions directly in database table. When the transaction ends, obsolete rows are not cleaned in any way. There is, however, a garbage-collecting thread called AUTOVACUUM which goes through all tables in the database and removes obsolete rows, reclaiming free space in the DBF. Consequences:

Since everything about row versioning happens in the database table, there is no need to recover old row versions from separate storage. Thus, speed of transaction COMMIT and ROLLBACK is the same.
Client activity (transactions) generates bloat on tables and indexes. This is normal, the bloat is periodically reclaimed.
Because PostgreSQL implements UPDATE as an INSERT of a new row version and DELETE of the old row version, even the UPDATE statements generate bloat.
INSERT statements can also generate bloat, for example if they are used for storing temporary results in some scratchpad table.
Creating a lot of temporary tables and dropping them afterwards causes INSERT,DELETE into the PostgreSQL catalog tables. If you use this technique in your batch jobs, you indirectly generate bloat in database's system tables.

Although the bloat should be reclaimed automatically, it may not happen frequently enough. In case you have to do a maintenance, run VACUUM ANALYZE table_name on the bloated table. This command reclaims free space from the tables (and corresponding DBF) and shrinks the DBF a bit - the space between high water mark and the physical end of the datafile is reclaimed and returned to OS. Plain VACUUM can be run without affecting database clients.

To monitor bloat ration on your tables and indexes, you can use this project.

If you want to reclaim all free space from DBF to OS, you have to use VACUUM FULL (which locks the table) or use additional module like pgRepack (which can do it without extensive locking).

Although PostgreSQL does a good job in self maintenance, some tables may remain bloated indefinitely. This is because of AUTOVACUUM settings.

PostgreSQL has plenty of parameters to tune the autovacuum. You can set them either globally in the configuration file or per table using ALTER TABLE statement.

Autovacuum is ran on such called "candidate tables".

A table is not a candidate table for VACUUM unless its bloat ratio crosses some threshold that is computed as (autovacuum\_vacuum\_scale\_factor * number of all rows in the table) + autovacuum\_vacuum\_threshold.
A table is not a candidate table for ANALYZE unless its bloat ratio crosses some threshold that is computed as (autovacuum\_analyze\_scale\_factor * number of all rows in the table) + autovacuum\_analyze\_threshold.

The interesting thing about those formulas is that they depend on total number of rows in a table. This means that the bigger the table is, the more changes have to happen on it before it is considered a candidate table. As a result, big tables (even with heavy activity) tend to be vacuumed less frequently than small tables. Those big tables tend to accumulate bloat more and more over time.

Autovacuum runs periodically and waits autovacuum\_naptime between its executions. There are autovacuum\_max\_workers threads that perform the vacuum operation. There can be only one worker processing a particular table and autovacuum\_max\_workers are defined globally per database instance. This means that all databases in the RDBMS share a pool of those worker threads.

Moreover, because vacuum can be I/O and CPU-hungry, there is a global limit on vacuum cost autovacuum\_vacuum\_cost\_limit that is divided between autovacuum worker threads. When the autovacuum worker hits the limit, it will sleep for a autovacuum\_cost\_delay milliseconds. Vacuuming a huge table can then take multiple runs of the vacuum thread. It the table is frequently updated and the autovacuum does not keep up, this results in bloat accumulation.

Recommendations for AUTOVACUUM tuning:

Check trends of bloat on your tables. Set lower autovacuum\_vacuum\_scale\_factor for tables that tend to accumulate bloat over time.
Set lower autovacuum\_vacuum\_scale\_factor for big tables.
The bigger and more frequently used the table is, the more aggresively the autovacuum should be run.
Log long-running autovacuums. You can use log\_autovacuum\_min\_duration parameter for this. Philosophy is the same as slow query logging.
For complete list of parameters, consult official documentation.

When using btree indexes (which IdM does a lot), indexes can become bloated. Bloated index has many dead leaves that are worth reclaiming. They are bigger and thus harder to manage and traverse. Such index does not have its tree-like structure and is partly degenerated to a linked list. This worsens performance from theoretical *O(log(n))* to *O(n)* in some cases. Practical effect is, the index lookup is slower when using bloated index.

It is worth noting that having index that is "big" in terms of size is generally not a problem. Depending on the database, having index be bigger than the indexed table itself may even be a feature. The point of having index is to make lookups fast and to minimize locking. Having tightly-packed index is partially undesirable because most of the time, you insert new leaves somewhere into the middle of the btree. If the index has no spare space it has to be either rebalanced or skewed. In the long run, this is a performance hit waiting to happen.

From time to time, the database can benefit from index rebuild which removes the bloat.

REINDEX INDEX index_name; -- to recreate specific index
REINDEX TABLE TABLE_NAME; -- to recreate all indexes on the table
REINDEX DATABASE database_name; -- to recreate all indexes in a database
-- and other variants

The REINDEX command needs to lock all write operations on the indexed table, so it usually conflicts with normal database operations.

Since PostgreSQL 12, the REINDEX … CONCURRENTLY can be used. This command performs online rebuild of the index without blocking write operations on the table. See official documentation for details.

Having antivirus software on a database machine can be a performance killer. Antivirus software hooks itself to the OS kernel and basically inspects all data going through syscalls - network data, disk accesses, etc. For database which is I/O expensive appliance, this may lower the performance by up to 80% (we have actually seen this happen on Windows Server 2012R2 with a Symantec Endpoint Protection).

Symptoms of having this problem generally are:

Antivirus installed on the machine (obviously).
Overall low performance of the RDBMS.
Relatively low CPU utilization.
High context switch rate.

If you are absolutely positive you need to run an antivirus software on the database machine, configure exceptions where appropriate.

Exceptions for accessing datafiles.
Exceptions for network communication of the database process.

CzechIdM database maintenance

Monitoring database performance

OS-level monitoring

Database-level monitoring

Tuning the database

Defragmenting database files on the filesystem

Reducing bloat

Optimizing AUTOVACUUM

Rebuilding indexes

Tuning antivirus software