Adjust the values for your environment as needed based on your load tests.
== Performance recommendations
[WARNING]
====
* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations).
* Increased cache sizes can improve the performance when {project_name} instances running for a longer time.
This will decrease response times and reduce IOPS on the database.
Still, those caches need to be filled when an instance is restarted, so do not set resources too tight based on the stable state measured once the caches have been filled.
* In containers, Keycloak allocates 70% of the memory limit for heap based memory. It will also use approximately 300 MB of non-heap-based memory.
To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7.
* Leave 200% extra head-room for CPU usage to handle spikes in the load.
This ensures a fast startup of the node, and sufficient capacity to handle failover tasks like, for example, re-balancing Infinispan caches, when one node fails.
Performance of {project_name} dropped significantly when its Pods were throttled in our tests.
{project_name}, which by default stores user sessions in the database, requires the following resources for optimal performance on an Aurora PostgreSQL multi-AZ database:
For every 100 login/logout/refresh requests per second:
- Budget for 1400 Write IOPS.
- Allocate between 0.35 and 0.7 vCPU.
The vCPU requirement is given as a range, as with an increased CPU saturation on the database host the CPU usage per request decreased while the response times increase. A lower CPU quota on the database can lead to slower response times during peak loads. Choose a larger CPU quota if fast response times during peak loads are critical. See below for an example.
(45 logins per second = 3 vCPU, 600 client credential grants per second = 3 vCPU, 345 refresh token = 3 vCPU. This sums up to 9 vCPU total. With 3 Pods running in the cluster, each Pod then requests 3 vCPU)
* Aurora Database instance: either `db.t4g.large` or `db.t4g.xlarge` depending on the required response times during peak loads.
+
(45 logins per second, 5 logouts per second, 360 refresh tokens per seconds.
This sums up to 410 requests per second.
This expected DB usage is 1.4 to 2.8 vCPU, with a DB idle load of 0.3 vCPU.
This indicates either a 2 vCPU `db.t4g.large` instance or a 4 vCPU `db.t4g.xlarge` instance.
A 2 vCPU `db.t4g.large` would be more cost-effective if the response times are allowed be higher during peak usage.
In our tests, the median response time for a login and a token refresh increased by up to 120 ms once the CPU saturation reached 90% on a 2 vCPU `db.t4g.large` instance given this scenario.
For faster response times during peak usage, consider a 4 vCPU `db.t4g.xlarge` instance for this scenario.)
Keep the CPU requests and limits as above for the second site. This way any remaining site can take over the traffic from the primary site immediately without the need to scale.
Reduce the CPU requests and limits as above by 50% for the second site. When one of the sites fails, scale the remaining site from 3 Pod to 6 Pods either manually, automated, or using a Horizontal Pod Autoscaler. This requires enough spare capacity on the cluster or cluster auto-scaling capabilities.
Reduce the CPU requests by 50% for the second site, but keep the CPU limits as above. This way the remaining site can take the traffic but only at the downside the Nodes will experience a CPU pressure and therefore slower response times during peak traffic.
The benefit of this setup is that the number of Pod do not need to scale during failovers which is simpler to set up.
* Database Amazon Aurora PostgreSQL in a multi-AZ setup.
* Default user password hashing with Argon2 and 5 hash iterations and minimum memory size 7 MiB https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#argon2id[as recommended by OWASP] (which is the default).
* Client credential grants do not use refresh tokens (which is the default).
* Database seeded with 20,000 users and 20,000 clients.
* Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.