Server updates - Apache Tomcat updates
This tutorial shows how you can update the Apache Tomcat in your CzechIdM deployment. In this particular howto we are updating from Tomcat 8.5.11 to Tomcat 8.5.50. If necessary, adjust performed commands to your needs.
Things to consider
Apache Tomcat application container is a part of the CzechIdM stack and its update requires some service downtime. When done correctly, this downtime can be a matter of minutes - but you shall always plan for the worst. :) As always, there is a number of things to consider.
- Impact on users
- IdM is often deployed as a self-service portal for users. You should plan the downtime such that minimal number of users is affected.
- Users may make changes in the IdM that start some long running tasks (e.g. automatic roles changes, bulk role assignments, etc.). Those tasks are executed asynchronously and may be running even if the user who started the task has already logged off.
- Impact on long running tasks (LRT)
- IdM has internal cron that schedules LRT jobs. To make things safe, no job should be running when you are doing the update. The safest way to achieve this is to stop the IdM service before applying updates.
- LRTs run usually at night so it is not entirely necessary to stop the IdM, but you have to make sure you have enough time to perform the patching (and possible rollback) before jobs start to execute.
- Restarting IdM cancels the LRT that was currently running, LRT will not pick up automatically after IdM goes up again.
- Nightly LRTs usually read HR system data. This means there are dependecies between them (e.g. synchronize identities, then contracts and/or time slices, then run recompute on them and finally run HR processes which enable/disable identities based on freshly synchronized data). Given the nature of deployment, those dependencies may be "hard" and it may be dangerous to skip some of LRTs or run them in different order.
- Impact on entity events
- Entity events that are currently running are lost on IdM restart. This usually affects from one to ten events; actual number of affected events depends on number of
event-executor
threads. - Entity events in other states are persisted into the database so they are not lost on IdM restart.
- No entity events should be in the event queue at the time of OS update. Because events are generated by LRTs or user actions, killing off LRTs and disconnecting users from IdM web interface is sufficient.
- Impact on end systems connected to IdM
- There is no direct impact on other systems.
- There may be some impact on "systems" connected as CSVs, for example. Those integrations may base their functionality on some Apache project library that will get updated alongside the Tomcat. You should define tests for such cases.
- Impact on CzechIdM
- CzechIdM uses some libraries from the Apache project that are distributed within the Tomcat, not mentioning the application container itself.
- Some modules working with HTTP requests can stop to work.
- Known case here is an OpenAM authentication module: some versions of Tomcat think the cookie domain with leading dot (e.g.
.domain.tld
) is invalid, but unfortunately those are session cookies given out by the OpenAM upon successful user authentication. (This issue has already been solved, but it is still a great example.)
- Data in CzechIdM database will not get changed/broken during the Tomcat upgrade. You can safely return to previous Tomcat version without restoring the database.
- Impact on OS
- There is no impact on underlying operating system.
- Finding bugs
- It is for the best to have at least two environments - test env. and production env.
- Update the test environment first, then leave it running for at least one week. If no bugs are found by then, you can update the production environment. The one week provides minimal safe time frame where some of the bugs can manifest (e.g. memleaks).
- Define use-cases that are important for your deployment. Before and after the update, test if those use-cases work.
Performing the Tomcat update
Following list can be used as a basis for the maintenance checklist. Feel free to customize it to better suit your needs. You can safely perform almost all steps of this guide on a running system and then just switch the Tomcat distribution.
- Download new tomcat binaries from the web, in our case this is a 8.5.50 version.
- Unpack new Tomcat into
/opt/tomcat
alongside existing installation. - Set correct permissions for the new Tomcat (they are similar to those you ran during the clean installation of Tomcat).
cd /opt/tomcat/apache-tomcat-8.5.50 chmod -vR o+rX ./ chgrp -R tomcat conf/ bin/ lib/ chmod g+rwx conf/ chmod g+r conf/* chown -R tomcat webapps/ work/ temp/ logs/ cd /opt/tomcat chgrp tomcat apache-tomcat-8.5.50/
- Customize / copy over configuration from previous Tomcat installation. This means at least:
- Adjusting new
server.xml
- disabling shutdown port and tying AJP (8009/tcp) and HTTP (8080/tcp) ports to localhost. - Remove contens of new Tomcat's
webapps/
directory. - To perform these steps, see installation guide for details.
- Link-in the PostgreSQL JDBC driver.
cd /opt/tomcat/apache-tomcat-8.5.50/lib/ ln -sv /usr/share/java/postgresql-jdbc.jar
- Create / copy over the
setenv.sh
file from the old Tomcat'sbin/
directory to thebin/
of the new Tomcat. Make sure the file has correct permissions. - Copy over the
idm.war
file from the old Tomcat'swebapps/
directory to thewebapps/
of the new Tomcat. - THE DOWNTIME STARTS HERE.
- Disconnect all users, kill LRT, etc. and start the maintenance.
- Disable monitoring system notifications.
- Switch to new Tomcat.
systemctl stop tomcat # here you should make database backup; you should not need it at all, just to be safe cd /opt/tomcat unlink current ln -s apache-tomcat-8.5.50 current systemctl start tomcat
- Check logs to see if the application started properly.
- Perform testing with selected use-cases, test connected end systems, etc.
- If you try to connect through Apache HTTPd and it shows "bad gateway" error, give the HTTPd a reload/restart (or just wait a while longer). The web server has a heartbeat interval to check if upstream (i.e. Tomcat) lives or not and simply thinks the application container is not running yet.
- THE DOWNTIME ENDS HERE.
- Enable monitoring system notifications.
- (After a week or so, if everything runs fine.) Clean up the previous installation of Tomcat, move application logs you want to preserve to a new destination.
Resolving issues
Returning back from an unsuccessful update means just swapping back to the old Tomcat installation. It requires some downtime. (In this example, we are returning to the 8.5.11 version of Tomcat.)
- THE DOWNTIME STARTS HERE.
- Disconnect all users, kill LRT, etc. and start the maintenance.
- Disable monitoring system notifications.
- Switch back to old Tomcat.
systemctl stop tomcat cd /opt/tomcat unlink current ln -s apache-tomcat-8.5.11 current systemctl start tomcat
- Check logs to see if the application started properly.
- Perform testing with selected use-cases, test connected end systems, etc. to make sure everything works as expected.
- THE DOWNTIME ENDS HERE.
- Enable monitoring system notifications.
Troubleshooting
The error message "The server understood the request but refuses to authorize it." when coming to IdM through Apache web server means that you have to set AJP secret
ProxyPass / ajp://127.0.0.1:8009/ secret=**tomcat_ajp_secret**
in the /etc/httpd/conf.d/ssl.conf
as written in the standard server preparation tutorial (httpd_installation_and_configuration).