====== Database design ======

{{tag>database}}

[[.:indexmenu_n_40?media=devel:documentation:architecture:dev:indexmenu_n_40|indexmenu_n_40]] <note tip>[[..:..:conventions:dev:database-conventions|]]</note>

===== Database scripts =====

For the automatic creation and modification of the database scheme, the [[.:flyway|]] is used.

===== JPA metamodel =====

We are using generated jpa metamodel by [[https://github.com/bcvsolutions/CzechIdMng/tree/develop/Realization/backend#setup-jpa-metamodel-generation-in-eclipse|configuration]], where criteria api is used.

<code java>
...
// id
if (filter.getId() != null) {
  predicates.add(builder.equal(root.get(AbstractEntity_.id), filter.getId()));
}
...

</code>

===== Primary keys (bigint vs. uuid) =====

Pros and cons of the use of uuid as the primary key for the database.

  * **Pros**:
      * **Easier data migration**  between the environments (e.g. cloning the pre-production to production)
      * generating the identifier on the part of the client (distributed system, offline access)
      * impossibility of tipping the url (the guid is not sequential)
  * **Cons**:
      * double size of the indexes compared to ''bigint''  (''8B''  vs ''16B'') - for numbers, see the next chapter with testing
      * the uuid is unreadable at the first sight directly in the database - the uuid is saved in the data type ''binary(16)''. If the uuid was saved to be readable, this would imply more disadvantages:
        * uuid native data type for the databse in question (e.g. uuid in posqresql) would result in the need to maintain a set of entities according to the chosen database on the application level
        * ''string''  - the size of the column would be of ''char(36)'', which would make the indexes 4 times bigger compared to the use of data type ''bigint''
      * some databases always save the primary key together with the data (clustered index, MySQL/InnoDB), which may lead to data fragmentation
      * illegible / unmemorable url
      * Different uuid representation [[https://stackoverflow.com/questions/41651681/different-representation-of-uuid-in-java-hibernate-and-sql-server|in application and MSSQL database]].

Listing and searching according to the string guid in PostgreSQL:

<code sql>
SELECT encode(i.id, 'hex')::uuid, i.* FROM idm_identity i WHERE i.id = uuid_send('5cd7ec1c-6b34-45b7-9835-e533e9374ebf'::uuid)

</code>

Generating uuid without uuid-ossp package instaled in PosgreSQL:

<code sql>
INSERT INTO idm_configuration(id, created, creator, name, VALUE, secured, confidential) VALUES (
    decode(md5(random()::text || clock_timestamp()::text), 'hex'),
    now(),
    '[SYSTEM]',
    'idm.sec.core.tree.defaultType',
    (SELECT id FROM idm_tree_type WHERE default_tree_type = TRUE),
    TRUE,
    FALSE);

</code>

==== Primary keys testing ====

The aim of the testing should be to verify the steady time of data record / reading in the chart with an increasing number of records and the size of the charts and indexes. Two charts in the binding 1:N (master/detail) with the primary keys, indexes for the uniqueness of the record name (master name and detail name in one master) and for the index of the relation between the masters - detail for searching all the details of the master in question were created (todo: link to git for the structure, when it appears in develop).

**How to read the individual records**:

  * with a number of ''5000''  records saved in the master chart, the insert of one record into the master chart took ''9ms''.
  * with a number of ''1943455''  records saved in the detail chart, the size of all the indexes for the detail chart was ''171MB''.
  * with a number of ''19505030''  records saved in the detail chart, the PK index size for the detail chart was ''417MB''.

^''bigint'' as the primary key in PostgreSQL 9.4  ^^^| | | | |
^Number of records (master)^Number of records (detail)  ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)|
|5000|96995|9ms|10ms|1ms|0,5MB|8MB|2MB|
|10000|203597|11ms|11ms|2ms|1MB|18MB|4MB|
|50000|967475|11ms|11ms|2ms|4MB|85MB|19.5MB|
|100000|1943455|11ms|12ms|2ms|8MB|171MB|39MB|
|1000000|''19505030'' |11ms|''12ms'' |2.5ms|73MB|1703MB|''417MB'' |

^''binary(16)'' representing uuid as the primary key in PostgreSQL 9.4  ^^^| | | | |
^Number of records (master)^Number of records (detail)  ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)|
|5000|97652|11ms|12ms|2.5ms|1MB|16M|4.5MB|
|10000|194804|11.2ms|13,5ms|3ms|1.5MB|31MB|9MB|
|50000|973836|11.4ms|13.5ms|3ms|5MB|156MB|45.5MB|
|100000|1947062|11.7ms|14.5ms|3ms|10MB|313MB|95.2MB|
|1000000|''19380690'' |30ms|''30ms'' |3.5ms|68MB|2653MB|''750MB'' |

The testing took place on localhost (16GB RAM, 2 processors, regular HDD, default database configuration, logging turned off), so the individual times are not fully representative - it is the stability that matters the most. With the increasing number of records, the disadvantage of the non-sequential uuid PK starts to come through, when there are more accesses to the disk - work with different pages (the default set-up size of one page is 8 kB - can contain approx. 200 uuid) in which the index itself is saved.

After replacement of HDD by SSD:

| | | | | |
^''binary(16)'' representing uuid as the primary key in PostgreSQL 9.4 - SSD disk  ^^^^^^^|
^Number of records (master)^Number of records (detail)  ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)|
|1000000|''19 380 690'' |5ms|''5ms'' |3.5ms|68MB|2653MB|''750MB'' |
|1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB|
|1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB|
|1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB|

The use of ''binary(16)''  identifiers in PostgreSQL 9.4 on the number of records tested **is**  (~40M) **stable**.