Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorial:adm:server_os_updates [2019/12/17 07:46]
fiserp [Performing the OS update]
tutorial:adm:server_os_updates [2020/02/17 14:34] (current)
fiserp [Performing the OS update]
Line 1: Line 1:
 ====== Server updates - OS updates ====== ====== Server updates - OS updates ======
-<note warning>Page in construction, please do not use (yet).</note> 
 To ensure secure operation, servers in the infrastructure have to be kept up to date. This tutorial addresses the need for OS updates of the IdM server and gives basic guidelines and recommendations. To ensure secure operation, servers in the infrastructure have to be kept up to date. This tutorial addresses the need for OS updates of the IdM server and gives basic guidelines and recommendations.
  
Line 6: Line 5:
 Each organization has some sort of schedule to apply OS patches: weekly, monthly, quarterly, never (not a good one), etc. You can patch the OS according to your strategy, but we recommend to perform patching at least once every three months. IdM relies on packages and libraries from the operating system and if those are not patched, security of the whole IdM solution also deteriorates. Each organization has some sort of schedule to apply OS patches: weekly, monthly, quarterly, never (not a good one), etc. You can patch the OS according to your strategy, but we recommend to perform patching at least once every three months. IdM relies on packages and libraries from the operating system and if those are not patched, security of the whole IdM solution also deteriorates.
  
-==== Things to consider ====+===== Things to consider =====
 Before applying updates, there are few things to consider: Before applying updates, there are few things to consider:
   * Impact on users   * Impact on users
Line 15: Line 14:
     * LRTs run usually at night so it is not entirely necessary to stop the IdM, but you have to make sure you have enough time to perform the patching (and possible rollback) before jobs start to execute.     * LRTs run usually at night so it is not entirely necessary to stop the IdM, but you have to make sure you have enough time to perform the patching (and possible rollback) before jobs start to execute.
     * Restarting IdM cancels the LRT that was currently running, LRT **will not pick up automatically** after IdM goes up again.     * Restarting IdM cancels the LRT that was currently running, LRT **will not pick up automatically** after IdM goes up again.
-    * Nightly LRTs usually read HR system data. This means there are dependecies between them (e.g. synchronize identities, then contracts and/or time slices, then run recompute on them and finally run HR processes which enable/disbale identities based on freshly synchronized data). Given the nature of deployment, those dependencies may be "hard" and it may be dangerous to skip some of LRTs or run them in different order.+    * Nightly LRTs usually read HR system data. This means there are dependecies between them (e.g. synchronize identities, then contracts and/or time slices, then run recompute on them and finally run HR processes which enable/disable identities based on freshly synchronized data). Given the nature of deployment, those dependencies may be "hard" and it may be dangerous to skip some of LRTs or run them in different order
 +  * Impact on entity events 
 +    * Entity events that are currently running **are lost** on IdM restart. This usually affects from one to ten events; actual number of affected events depends on number of ''event-executor'' threads. 
 +    * Entity events in other states are persisted into the database so they are not lost on IdM restart. 
 +    * No entity events should be in the event queue at the time of OS update. Because events are generated by LRTs or user actions, killing off LRTs and disconnecting users from IdM web interface is sufficient.
   * Impact on end systems connected to IdM   * Impact on end systems connected to IdM
     * There is no direct impact on other systems.     * There is no direct impact on other systems.
Line 34: Line 37:
     * Define use-cases that are important for your deployment. Before and after the update, test if those use-cases work.     * Define use-cases that are important for your deployment. Before and after the update, test if those use-cases work.
  
-==== Performing the OS update ====+===== Performing the OS update ====
 +Following list can be used as a basis for the maintenance checklist. Feel free to customize it to better suit your needs.
   - Preparations   - Preparations
     - Prepare testing use-cases.     - Prepare testing use-cases.
Line 42: Line 46:
   - Perform the update   - Perform the update
     - Begin the maintenance.     - Begin the maintenance.
 +    - Disable monitoring system notifications.
     - (If you use hot snapshots, make one.)     - (If you use hot snapshots, make one.)
     - Make sure no user or external application can access the IdM.     - Make sure no user or external application can access the IdM.
Line 53: Line 58:
     - Make backup of ``/boot``, ``/etc``, list of processes ``ps -ef`` and list of network services ``netstat -tulnp`` (or ``ss -tulnp``). Those dumps will help you check if all the services started. You can also recover some settings from backups in case something goes wrong (in a minor way) - you will not need to roll back whole snapshot.     - Make backup of ``/boot``, ``/etc``, list of processes ``ps -ef`` and list of network services ``netstat -tulnp`` (or ``ss -tulnp``). Those dumps will help you check if all the services started. You can also recover some settings from backups in case something goes wrong (in a minor way) - you will not need to roll back whole snapshot.
     - Perform the update (e.g. ``yum update``).     - Perform the update (e.g. ``yum update``).
 +      - YMMV depending on the packages being updated. Also when upgrading PostgreSQL, there are additional steps you have to perform.
     - Restart affected services or reboot the whole machine if necessary.     - Restart affected services or reboot the whole machine if necessary.
     - When the machine is up, check ``dmesg`` and ``/var/log/{messages,syslog}`` or analogous files for your OS.     - When the machine is up, check ``dmesg`` and ``/var/log/{messages,syslog}`` or analogous files for your OS.
Line 64: Line 70:
     - (If there were changes to the database (e.g. PostgreSQL major version upgrade), make a backup of the upgraded database.)     - (If there were changes to the database (e.g. PostgreSQL major version upgrade), make a backup of the upgraded database.)
     - Allow users to access the IdM.     - Allow users to access the IdM.
 +    - Enable monitoring system notifications.
     - End the maintenance.     - End the maintenance.
   - Wrap-up   - Wrap-up
Line 73: Line 80:
 <note>For Windows OSes, the update process is roughly the same. For checking services, status of the system and system logs, use the Event Viewer and Server Manager.</note> <note>For Windows OSes, the update process is roughly the same. For checking services, status of the system and system logs, use the Event Viewer and Server Manager.</note>
  
-==== Solving issues ====+===== Resolving issues =====
 For maintenance actions, it is necessary to: For maintenance actions, it is necessary to:
   * Know how long each task will take and to measure the task duration when actually performing them.   * Know how long each task will take and to measure the task duration when actually performing them.
Line 81: Line 88:
   * Know how long (at worst) the whole rollback will take (rollback time **RT**).   * Know how long (at worst) the whole rollback will take (rollback time **RT**).
   * Have a maintenance window that spans at least **MT**+**RT** with some extra time **ET**.   * Have a maintenance window that spans at least **MT**+**RT** with some extra time **ET**.
-    * You are not able to safely perform the maintenance in shorter window, there is simply not enough time. If something goes wrong, you need at most **RT** time to perform the rollback! +    * You are not able to safely perform the maintenance in shorter window, there is simply not enough time. If something goes wrong, you will need **RT** time to perform the rollback! 
-    * If you do not have any **ET**, if anything goes wrong you have to perform rollback procedure. Therefore, **ET** gives you some time you can spend on solving the issue so you can carry on with updates.+    * When you have no **ET**, if anything goes wrong you have to perform rollback procedure. Therefore, **ET** gives you some time you can spend on solving the issue so you can carry on with updates.
  
-You should have a rollback procedure that can safely restore the deployment. This depends on your environment. +  * You should have a rollback procedure that can safely restore the deployment. 
- +    * This depends on your environment and on the way you updated OS packages
-Fortunately, in most cases it simply means restoring the snapshot of the virtual machine. After restoring the snapshot, you have to perform tests (with test use-cases) to confirm the rollback was performed correctly. +  Fortunately, in most cases it simply means restoring the snapshot of the virtual machine. 
-Minor issues can be generally resolved with the help of ``/boot`` and ``/etc`` backups you created before updating the OS. +    * After restoring the snapshot, you have to perform tests (with test use-cases) to confirm the rollback was performed correctly. 
- +    Minor issues can be generally resolved with the help of ``/boot`` and ``/etc`` backups you created before updating the OS. 
-If IdM installation gets hit, you can debug the configuration or restore it from periodic backup. Since IdM is not installed from OS packages, this basically never happens.+  If IdM installation gets hit, you can debug the configuration or restore it from periodic backup. Since IdM is not installed from OS packages, this basically never happens.
  • by fiserp