Updated performance impact due to changed hashing

Fixes #27900 Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
2024-03-14 14:15:30 +01:00 · 2024-03-14 14:15:30 +01:00 · fbdb2ed9f7
commit fbdb2ed9f7
parent 4ab4fa94fb
4 changed files with 27 additions and 18 deletions
--- a/docs/documentation/upgrading/topics/changes/changes-24_0_0.adoc
+++ b/docs/documentation/upgrading/topics/changes/changes-24_0_0.adoc
@ -408,8 +408,9 @@ This can be done by configuring the hash iterations explicitly in the password p
 == Expected increased overall CPU usage and temporary increased database activity

 The Concepts for sizing CPU and memory resources in the {project_name} High Availability guide have been updated to reflect the new hashing defaults.
-While the CPU usage per password-based login in our tests increased by 33% (which includes both the changed password hashing and unchanged TLS connection handling), the overall CPU increase should be around 10% to 15%.
-This is due to the averaging effect of {project_name}'s other activities like refreshing access tokens and client credential grants, still this depends on the unique workload of an installation.
+The CPU usage per password-based login in our tests increased by the factor of five, which includes both the changed password hashing and unchanged TLS connection handling.
+The overall CPU increase should be around the factor of two to three due to the averaging effect of {project_name}'s other activities like refreshing access tokens and client credential grants.
+Still, this depends on the unique workload of an installation.

 After the upgrade, during a password-based login, the user's passwords will be re-hashed with the new hash algorithm and hash iterations as a one-off activity and updated in the database.
 As this clears the user from {project_name}'s internal cache, you will also see an increased read activity on the database level.
--- a/docs/documentation/upgrading/topics/changes/changes-24_0_2.adoc
+++ b/docs/documentation/upgrading/topics/changes/changes-24_0_2.adoc
@ -0,0 +1,6 @@
+ifeval::[{project_community}==true]
+= Changes to Password Hashing
+
+The release notes for {project_name} 24.0.0 have been updated with corrected description of the expected performance impact for the change, as well as sizing guide.
+
+endif::[]
--- a/docs/documentation/upgrading/topics/changes/changes.adoc
+++ b/docs/documentation/upgrading/topics/changes/changes.adoc
@ -5,6 +5,10 @@

 include::changes-25_0_0.adoc[leveloffset=3]

+=== Migrating to 24.0.2
+
+include::changes-24_0_2.adoc[leveloffset=3]
+
 === Migrating to 24.0.0

 include::changes-24_0_0.adoc[leveloffset=3]
--- a/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc
+++ b/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc
@ -39,9 +39,9 @@ Memory requirements increase with the number of client sessions per user session
 * In containers, Keycloak allocates 70% of the memory limit for heap based memory. It will also use approximately 300 MB of non-heap-based memory.
 To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7.

-* For each 30 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
+* For each 8 password-based user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
 +
-{project_name} spends most of the CPU time hashing the password provided by the user.
+{project_name} spends most of the CPU time hashing the password provided by the user, and it is proportional to the number of hash iterations.

 * For each 450 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second).
 +
@ -58,19 +58,19 @@ Performance of {project_name} dropped significantly when its Pods were throttled
 Target size:

 * 50,000 active user sessions
-* 30 logins per seconds
+* 24 logins per seconds
 * 450 client credential grants per second
 * 350 refresh token requests per second

 Limits calculated:

-* CPU requested: 3 vCPU
+* CPU requested: 5 vCPU
 +
-(30 logins per second = 1 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)
+(24 logins per second = 3 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)

-* CPU limit: 9 vCPU
+* CPU limit: 15 vCPU
 +
-(Allow for three times the CPU requested to handle peaks, startups and failover tasks, and also refresh token handling which we don't have numbers on, yet)
+(Allow for three times the CPU requested to handle peaks, startups and failover tasks)

 * Memory requested: 1250 MB
 +
@ -86,15 +86,13 @@ The following setup was used to retrieve the settings above to run tests of abou

 * OpenShift 4.14.x deployed on AWS via ROSA.
 * Machinepool with `m5.4xlarge` instances.
-* {project_name} deployed with the Operator and 3 pods.
-* Default user password hashing with PBKDF2(SHA512) 210,000 hash iterations (which is the default).
-* Client credential grants don't use refresh tokens (which is the default).
-* Database seeded with 100,000 users and 100,000 clients.
-* Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
-* All sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.
+* {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/passive mode.
 * OpenShift's reverse proxy running in passthrough mode were the TLS connection of the client is terminated at the Pod.
-* PostgreSQL deployed inside the same OpenShift with ephemeral storage.
-+
-Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar.
+* Database Amazon Aurora PostgreSQL in a multi-AZ setup, with the writer instance in the availability zone of the primary site.
+* Default user password hashing with PBKDF2(SHA512) 210,000 hash iterations which is the default https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pbkdf2[as recommended by OWASP].
+* Client credential grants don't use refresh tokens (which is the default).
+* Database seeded with 20,000 users and 20,000 clients.
+* Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
+* All sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.

 </@tmpl.guide>