====== Database design ======
{{tag>database}}
[[.:indexmenu_n_40?media=devel:documentation:architecture:dev:indexmenu_n_40|indexmenu_n_40]] [[..:..:conventions:dev:database-conventions|]]
===== Database scripts =====
For the automatic creation and modification of the database scheme, the [[.:flyway|]] is used.
===== JPA metamodel =====
We are using generated jpa metamodel by [[https://github.com/bcvsolutions/CzechIdMng/tree/develop/Realization/backend#setup-jpa-metamodel-generation-in-eclipse|configuration]], where criteria api is used.
...
// id
if (filter.getId() != null) {
predicates.add(builder.equal(root.get(AbstractEntity_.id), filter.getId()));
}
...
===== Primary keys (bigint vs. uuid) =====
Pros and cons of the use of uuid as the primary key for the database.
* **Pros**:
* **Easier data migration** between the environments (e.g. cloning the pre-production to production)
* generating the identifier on the part of the client (distributed system, offline access)
* impossibility of tipping the url (the guid is not sequential)
* **Cons**:
* double size of the indexes compared to ''bigint'' (''8B'' vs ''16B'') - for numbers, see the next chapter with testing
* the uuid is unreadable at the first sight directly in the database - the uuid is saved in the data type ''binary(16)''. If the uuid was saved to be readable, this would imply more disadvantages:
* uuid native data type for the databse in question (e.g. uuid in posqresql) would result in the need to maintain a set of entities according to the chosen database on the application level
* ''string'' - the size of the column would be of ''char(36)'', which would make the indexes 4 times bigger compared to the use of data type ''bigint''
* some databases always save the primary key together with the data (clustered index, MySQL/InnoDB), which may lead to data fragmentation
* illegible / unmemorable url
* Different uuid representation [[https://stackoverflow.com/questions/41651681/different-representation-of-uuid-in-java-hibernate-and-sql-server|in application and MSSQL database]].
Listing and searching according to the string guid in PostgreSQL:
SELECT encode(i.id, 'hex')::uuid, i.* FROM idm_identity i WHERE i.id = uuid_send('5cd7ec1c-6b34-45b7-9835-e533e9374ebf'::uuid)
Generating uuid without uuid-ossp package instaled in PosgreSQL:
INSERT INTO idm_configuration(id, created, creator, name, VALUE, secured, confidential) VALUES (
decode(md5(random()::text || clock_timestamp()::text), 'hex'),
now(),
'[SYSTEM]',
'idm.sec.core.tree.defaultType',
(SELECT id FROM idm_tree_type WHERE default_tree_type = TRUE),
TRUE,
FALSE);
==== Primary keys testing ====
The aim of the testing should be to verify the steady time of data record / reading in the chart with an increasing number of records and the size of the charts and indexes. Two charts in the binding 1:N (master/detail) with the primary keys, indexes for the uniqueness of the record name (master name and detail name in one master) and for the index of the relation between the masters - detail for searching all the details of the master in question were created (todo: link to git for the structure, when it appears in develop).
**How to read the individual records**:
* with a number of ''5000'' records saved in the master chart, the insert of one record into the master chart took ''9ms''.
* with a number of ''1943455'' records saved in the detail chart, the size of all the indexes for the detail chart was ''171MB''.
* with a number of ''19505030'' records saved in the detail chart, the PK index size for the detail chart was ''417MB''.
^''bigint'' as the primary key in PostgreSQL 9.4 ^^^| | | | |
^Number of records (master)^Number of records (detail) ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)|
|5000|96995|9ms|10ms|1ms|0,5MB|8MB|2MB|
|10000|203597|11ms|11ms|2ms|1MB|18MB|4MB|
|50000|967475|11ms|11ms|2ms|4MB|85MB|19.5MB|
|100000|1943455|11ms|12ms|2ms|8MB|171MB|39MB|
|1000000|''19505030'' |11ms|''12ms'' |2.5ms|73MB|1703MB|''417MB'' |
^''binary(16)'' representing uuid as the primary key in PostgreSQL 9.4 ^^^| | | | |
^Number of records (master)^Number of records (detail) ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)|
|5000|97652|11ms|12ms|2.5ms|1MB|16M|4.5MB|
|10000|194804|11.2ms|13,5ms|3ms|1.5MB|31MB|9MB|
|50000|973836|11.4ms|13.5ms|3ms|5MB|156MB|45.5MB|
|100000|1947062|11.7ms|14.5ms|3ms|10MB|313MB|95.2MB|
|1000000|''19380690'' |30ms|''30ms'' |3.5ms|68MB|2653MB|''750MB'' |
The testing took place on localhost (16GB RAM, 2 processors, regular HDD, default database configuration, logging turned off), so the individual times are not fully representative - it is the stability that matters the most. With the increasing number of records, the disadvantage of the non-sequential uuid PK starts to come through, when there are more accesses to the disk - work with different pages (the default set-up size of one page is 8 kB - can contain approx. 200 uuid) in which the index itself is saved.
After replacement of HDD by SSD:
| | | | | |
^''binary(16)'' representing uuid as the primary key in PostgreSQL 9.4 - SSD disk ^^^^^^^|
^Number of records (master)^Number of records (detail) ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)|
|1000000|''19 380 690'' |5ms|''5ms'' |3.5ms|68MB|2653MB|''750MB'' |
|1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB|
|1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB|
|1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB|
The use of ''binary(16)'' identifiers in PostgreSQL 9.4 on the number of records tested **is** (~40M) **stable**.