Upgrade Keycloak's sizing guide for KC26 (#32344)

Closes #32343 Signed-off-by: Alexander Schwartz <aschwart@redhat.com> Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net> Co-authored-by: Kamesh Akella <kakella@redhat.com>
2024-08-27 16:38:38 +02:00 · 2024-08-27 16:38:38 +02:00 · 8e0d50edc0
commit 8e0d50edc0
parent cec359f0a2
1 changed files with 69 additions and 36 deletions
--- a/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc
+++ b/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc
@ -6,7 +6,7 @@ title="Concepts for sizing CPU and memory resources"
 summary="Understand these concepts to avoid resource exhaustion and congestion"
 tileVisible="false" >

-Use this as a starting point to size a product environment. 
+Use this as a starting point to size a product environment.
 Adjust the values for your environment as needed based on your load tests.

 == Performance recommendations
@ -29,83 +29,116 @@ Summary:

 Recommendations:

-* The base memory usage for an inactive Pod is 1000 MB of RAM.
-
-* For each 100,000 active user sessions, add 500 MB per Pod in a three-node cluster (tested with up to 200,000 sessions).
-+
-This assumes that each user connects to only one client.
-Memory requirements increase with the number of client sessions per user session (not tested yet).
+* The base memory usage for a Pod including caches of Realm data and 10,000 cached sessions is 1250 MB of RAM.

 * In containers, Keycloak allocates 70% of the memory limit for heap based memory. It will also use approximately 300 MB of non-heap-based memory.
 To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7.

-* For each 45 password-based user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
+* For each 15 password-based user logins per second, allocate 1 vCPU to the cluster (tested with up to 300 per second).
 +
 {project_name} spends most of the CPU time hashing the password provided by the user, and it is proportional to the number of hash iterations.

-* For each 500 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second).
+* For each 200 client credential grants per second, 1 vCPU to the cluster (tested with up to 2000 per second).
 +
 Most CPU time goes into creating new TLS connections, as each client runs only a single request.

-* For each 350 refresh token requests per second, 1 vCPU per Pod in a three-node cluster (tested with up to 435 refresh token requests per second).
+* For each 120 refresh token requests per second, 1 vCPU to the cluster (tested with up to 435 refresh token requests per second).

 * Leave 200% extra head-room for CPU usage to handle spikes in the load.
 This ensures a fast startup of the node, and sufficient capacity to handle failover tasks like, for example, re-balancing Infinispan caches, when one node fails.
 Performance of {project_name} dropped significantly when its Pods were throttled in our tests.

-=== Calculation example
+{project_name}, which by default stores user sessions in the database, requires the following resources for optimal performance on an Aurora PostgreSQL multi-AZ database:
+
+For every 100 login/logout/refresh requests per second:
+
+- Budget for 1400 Write IOPS.
+
+- Allocate between 0.35 and 0.7 vCPU.
+
+The vCPU requirement is given as a range, as with an increased CPU saturation on the database host the CPU usage per request decreased while the response times increase. A lower CPU quota on the database can lead to slower response times during peak loads. Choose a larger CPU quota if fast response times during peak loads are critical. See below for an example.
+
+=== Calculation example (single site)

 Target size:

-* 50,000 active user sessions
-* 45 logins per seconds
-* 500 client credential grants per second
-* 350 refresh token requests per second
+* 45 logins and logouts per seconds
+* 600 client credential grants per second
+* 360 refresh token requests per second (1:8 ratio for logins)
+* 3 Pods

 Limits calculated:

-* CPU requested: 3 vCPU
+* CPU requested per Pod: 3 vCPU
 +
-(45 logins per second = 1 vCPU, 500 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)
+(45 logins per second = 3 vCPU, 600 client credential grants per second = 3 vCPU, 345 refresh token = 3 vCPU. This sums up to 9 vCPU total. With 3 Pods running in the cluster, each Pod then requests 3 vCPU)

-* CPU limit: 9 vCPU
+* CPU limit per Pod: 9 vCPU
 +
 (Allow for three times the CPU requested to handle peaks, startups and failover tasks)

-* Memory requested: 1250 MB
+* Memory requested per Pod: 1250 MB
 +
-(1000 MB base memory plus 250 MB RAM for 50,000 active sessions)
+(1250 MB base memory)

-* Memory limit: 1360 MB
+* Memory limit per Pod: 1360 MB
 +
 (1250 MB expected memory usage minus 300 non-heap-usage, divided by 0.7)

-== Persistent user sessions
+* Aurora Database instance: either `db.t4g.large` or `db.t4g.xlarge` depending on the required response times during peak loads.
+
+(45 logins per second, 5 logouts per second, 360 refresh tokens per seconds.
+This sums up to 410 requests per second.
+This expected DB usage is 1.4 to 2.8 vCPU, with a DB idle load of 0.3 vCPU.
+This indicates either a 2 vCPU `db.t4g.large` instance or a 4 vCPU `db.t4g.xlarge` instance.
+A 2 vCPU `db.t4g.large` would be more cost-effective if the response times are allowed be higher during peak usage.
+In our tests, the median response time for a login and a token refresh increased by up to 120 ms once the CPU saturation reached 90% on a 2 vCPU `db.t4g.large` instance given this scenario.
+For faster response times during peak usage, consider a 4 vCPU `db.t4g.xlarge` instance for this scenario.)

-When using persistent sessions, {project_name} will use less memory as it will keep only a subset of sessions in memory.
-At the same time it will use more CPU resources to communicate with the database, and it will use a lot more CPU and write IO on the database to keep the session information up to date.
+////
+<#noparse>

-A performance test showed the following per 100 requests/second that update the database (login, logout, refresh token), tested up to 300 requests per second:
+./benchmark.sh eu-west-1 --scenario=keycloak.scenario.authentication.AuthorizationCode --server-url=${KEYCLOAK_URL} --realm-name=realm-0 --users-per-sec=45 --ramp-up=10 --refresh-token-period=2 --refresh-token-count=8 --logout-percentage=10 --measurement=600 --users-per-realm=20000 --log-http-on-failure

-* 0.25 vCPU on each Pod in a 3-node Keycloak cluster.
-* 0.25 CPU and 800 WriteIOPS on a Aurora PostgreSQL multi-AZ database base on a `db.t4g.large` instance type.
+</#noparse>
+////

-The average latency of the requests increased by 20-40 ms when running on an Aurora PostgreSQL multi-AZ database with a single reader instance on another AZ.
-The latency is expected to be lower when running in a single AZ or non-replicated database.
+=== Sizing a multi-site setup
+
+To create the sizing an active-active Keycloak setup with two AZs in one AWS region, following these steps:
+
+* Create the same number of Pods with the same memory sizing as above on the second site.
+
+* The database sizing remains unchanged. Both sites will connect to the same database writer instance.
+
+In regard to the sizing of CPU requests and limits, there are different approaches depending on the expected failover behavior:
+
+Fast failover and more expensive::
+Keep the CPU requests and limits as above for the second site. This way any remaining site can take over the traffic from the primary site immediately without the need to scale.
+
+Slower failover and more cost-effective::
+Reduce the CPU requests and limits as above by 50% for the second site. When one of the sites fails, scale the remaining site from 3 Pod to 6 Pods either manually, automated, or using a Horizontal Pod Autoscaler. This requires enough spare capacity on the cluster or cluster auto-scaling capabilities.
+
+Alternative setup for some environments::
+Reduce the CPU requests by 50% for the second site, but keep the CPU limits as above. This way the remaining site can take the traffic but only at the downside the Nodes will experience a CPU pressure and therefore slower response times during peak traffic.
+The benefit of this setup is that the number of Pod do not need to scale during failovers which is simpler to set up.

 == Reference architecture

 The following setup was used to retrieve the settings above to run tests of about 10 minutes for different scenarios:

-* OpenShift 4.14.x deployed on AWS via ROSA.
+* OpenShift 4.15.x deployed on AWS via ROSA.
 * Machinepool with `m5.4xlarge` instances.
-* {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/passive mode.
+* {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/active mode.
 * OpenShift's reverse proxy running in passthrough mode were the TLS connection of the client is terminated at the Pod.
-* Database Amazon Aurora PostgreSQL in a multi-AZ setup, with the writer instance in the availability zone of the primary site.
-* Default user password hashing with Argon2 and 5 hash iterations and minimum memory size 7 MiB https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#argon2id[as recommended by OWASP].
-* Client credential grants don't use refresh tokens (which is the default).
-* Database seeded with 100,000 users and 100,000 clients.
+* Database Amazon Aurora PostgreSQL in a multi-AZ setup.
+* Default user password hashing with Argon2 and 5 hash iterations and minimum memory size 7 MiB https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#argon2id[as recommended by OWASP] (which is the default).
+* Client credential grants do not use refresh tokens (which is the default).
+* Database seeded with 20,000 users and 20,000 clients.
 * Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
-* All sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.
+* All authentication sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.
+* All user and client sessions are stored in the database and are not cached in-memory as this was tested a multi-site setup.
+Expect a slightly higher performance for single-site setups as a fixed number of user and client sessions will be cached.
+* OpenJDK 21

 </@tmpl.guide>