====== Database design ====== {{tag>database}} [[.:indexmenu_n_40?media=devel:documentation:architecture:dev:indexmenu_n_40|indexmenu_n_40]] [[..:..:conventions:dev:database-conventions|]] ===== Database scripts ===== For the automatic creation and modification of the database scheme, the [[.:flyway|]] is used. ===== JPA metamodel ===== We are using generated jpa metamodel by [[https://github.com/bcvsolutions/CzechIdMng/tree/develop/Realization/backend#setup-jpa-metamodel-generation-in-eclipse|configuration]], where criteria api is used. ... // id if (filter.getId() != null) { predicates.add(builder.equal(root.get(AbstractEntity_.id), filter.getId())); } ... ===== Primary keys (bigint vs. uuid) ===== Pros and cons of the use of uuid as the primary key for the database. * **Pros**: * **Easier data migration** between the environments (e.g. cloning the pre-production to production) * generating the identifier on the part of the client (distributed system, offline access) * impossibility of tipping the url (the guid is not sequential) * **Cons**: * double size of the indexes compared to ''bigint'' (''8B'' vs ''16B'') - for numbers, see the next chapter with testing * the uuid is unreadable at the first sight directly in the database - the uuid is saved in the data type ''binary(16)''. If the uuid was saved to be readable, this would imply more disadvantages: * uuid native data type for the databse in question (e.g. uuid in posqresql) would result in the need to maintain a set of entities according to the chosen database on the application level * ''string'' - the size of the column would be of ''char(36)'', which would make the indexes 4 times bigger compared to the use of data type ''bigint'' * some databases always save the primary key together with the data (clustered index, MySQL/InnoDB), which may lead to data fragmentation * illegible / unmemorable url * Different uuid representation [[https://stackoverflow.com/questions/41651681/different-representation-of-uuid-in-java-hibernate-and-sql-server|in application and MSSQL database]]. Listing and searching according to the string guid in PostgreSQL: SELECT encode(i.id, 'hex')::uuid, i.* FROM idm_identity i WHERE i.id = uuid_send('5cd7ec1c-6b34-45b7-9835-e533e9374ebf'::uuid) Generating uuid without uuid-ossp package instaled in PosgreSQL: INSERT INTO idm_configuration(id, created, creator, name, VALUE, secured, confidential) VALUES ( decode(md5(random()::text || clock_timestamp()::text), 'hex'), now(), '[SYSTEM]', 'idm.sec.core.tree.defaultType', (SELECT id FROM idm_tree_type WHERE default_tree_type = TRUE), TRUE, FALSE); ==== Primary keys testing ==== The aim of the testing should be to verify the steady time of data record / reading in the chart with an increasing number of records and the size of the charts and indexes. Two charts in the binding 1:N (master/detail) with the primary keys, indexes for the uniqueness of the record name (master name and detail name in one master) and for the index of the relation between the masters - detail for searching all the details of the master in question were created (todo: link to git for the structure, when it appears in develop). **How to read the individual records**: * with a number of ''5000'' records saved in the master chart, the insert of one record into the master chart took ''9ms''. * with a number of ''1943455'' records saved in the detail chart, the size of all the indexes for the detail chart was ''171MB''. * with a number of ''19505030'' records saved in the detail chart, the PK index size for the detail chart was ''417MB''. ^''bigint'' as the primary key in PostgreSQL 9.4 ^^^| | | | | ^Number of records (master)^Number of records (detail) ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)| |5000|96995|9ms|10ms|1ms|0,5MB|8MB|2MB| |10000|203597|11ms|11ms|2ms|1MB|18MB|4MB| |50000|967475|11ms|11ms|2ms|4MB|85MB|19.5MB| |100000|1943455|11ms|12ms|2ms|8MB|171MB|39MB| |1000000|''19505030'' |11ms|''12ms'' |2.5ms|73MB|1703MB|''417MB'' | ^''binary(16)'' representing uuid as the primary key in PostgreSQL 9.4 ^^^| | | | | ^Number of records (master)^Number of records (detail) ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)| |5000|97652|11ms|12ms|2.5ms|1MB|16M|4.5MB| |10000|194804|11.2ms|13,5ms|3ms|1.5MB|31MB|9MB| |50000|973836|11.4ms|13.5ms|3ms|5MB|156MB|45.5MB| |100000|1947062|11.7ms|14.5ms|3ms|10MB|313MB|95.2MB| |1000000|''19380690'' |30ms|''30ms'' |3.5ms|68MB|2653MB|''750MB'' | The testing took place on localhost (16GB RAM, 2 processors, regular HDD, default database configuration, logging turned off), so the individual times are not fully representative - it is the stability that matters the most. With the increasing number of records, the disadvantage of the non-sequential uuid PK starts to come through, when there are more accesses to the disk - work with different pages (the default set-up size of one page is 8 kB - can contain approx. 200 uuid) in which the index itself is saved. After replacement of HDD by SSD: | | | | | | ^''binary(16)'' representing uuid as the primary key in PostgreSQL 9.4 - SSD disk ^^^^^^^| ^Number of records (master)^Number of records (detail) ^Insert (master)^Insert (detail)^Load details by master^Index Size (master)^Index size (detail)^PK size (detail)| |1000000|''19 380 690'' |5ms|''5ms'' |3.5ms|68MB|2653MB|''750MB'' | |1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB| |1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB| |1500000|28 732 177|5ms|5ms|4ms|134MB|4707MB|1130MB| The use of ''binary(16)'' identifiers in PostgreSQL 9.4 on the number of records tested **is** (~40M) **stable**.