Updated performance impact due to changed hashing

Fixes #27900

Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
This commit is contained in:
Alexander Schwartz 2024-03-14 14:15:30 +01:00 committed by Alexander Schwartz
parent 4ab4fa94fb
commit fbdb2ed9f7
4 changed files with 27 additions and 18 deletions

View file

@ -408,8 +408,9 @@ This can be done by configuring the hash iterations explicitly in the password p
== Expected increased overall CPU usage and temporary increased database activity
The Concepts for sizing CPU and memory resources in the {project_name} High Availability guide have been updated to reflect the new hashing defaults.
While the CPU usage per password-based login in our tests increased by 33% (which includes both the changed password hashing and unchanged TLS connection handling), the overall CPU increase should be around 10% to 15%.
This is due to the averaging effect of {project_name}'s other activities like refreshing access tokens and client credential grants, still this depends on the unique workload of an installation.
The CPU usage per password-based login in our tests increased by the factor of five, which includes both the changed password hashing and unchanged TLS connection handling.
The overall CPU increase should be around the factor of two to three due to the averaging effect of {project_name}'s other activities like refreshing access tokens and client credential grants.
Still, this depends on the unique workload of an installation.
After the upgrade, during a password-based login, the user's passwords will be re-hashed with the new hash algorithm and hash iterations as a one-off activity and updated in the database.
As this clears the user from {project_name}'s internal cache, you will also see an increased read activity on the database level.

View file

@ -0,0 +1,6 @@
ifeval::[{project_community}==true]
= Changes to Password Hashing
The release notes for {project_name} 24.0.0 have been updated with corrected description of the expected performance impact for the change, as well as sizing guide.
endif::[]

View file

@ -5,6 +5,10 @@
include::changes-25_0_0.adoc[leveloffset=3]
=== Migrating to 24.0.2
include::changes-24_0_2.adoc[leveloffset=3]
=== Migrating to 24.0.0
include::changes-24_0_0.adoc[leveloffset=3]

View file

@ -39,9 +39,9 @@ Memory requirements increase with the number of client sessions per user session
* In containers, Keycloak allocates 70% of the memory limit for heap based memory. It will also use approximately 300 MB of non-heap-based memory.
To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7.
* For each 30 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
* For each 8 password-based user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
+
{project_name} spends most of the CPU time hashing the password provided by the user.
{project_name} spends most of the CPU time hashing the password provided by the user, and it is proportional to the number of hash iterations.
* For each 450 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second).
+
@ -58,19 +58,19 @@ Performance of {project_name} dropped significantly when its Pods were throttled
Target size:
* 50,000 active user sessions
* 30 logins per seconds
* 24 logins per seconds
* 450 client credential grants per second
* 350 refresh token requests per second
Limits calculated:
* CPU requested: 3 vCPU
* CPU requested: 5 vCPU
+
(30 logins per second = 1 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)
(24 logins per second = 3 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)
* CPU limit: 9 vCPU
* CPU limit: 15 vCPU
+
(Allow for three times the CPU requested to handle peaks, startups and failover tasks, and also refresh token handling which we don't have numbers on, yet)
(Allow for three times the CPU requested to handle peaks, startups and failover tasks)
* Memory requested: 1250 MB
+
@ -86,15 +86,13 @@ The following setup was used to retrieve the settings above to run tests of abou
* OpenShift 4.14.x deployed on AWS via ROSA.
* Machinepool with `m5.4xlarge` instances.
* {project_name} deployed with the Operator and 3 pods.
* Default user password hashing with PBKDF2(SHA512) 210,000 hash iterations (which is the default).
* Client credential grants don't use refresh tokens (which is the default).
* Database seeded with 100,000 users and 100,000 clients.
* Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
* All sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.
* {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/passive mode.
* OpenShift's reverse proxy running in passthrough mode were the TLS connection of the client is terminated at the Pod.
* PostgreSQL deployed inside the same OpenShift with ephemeral storage.
+
Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar.
* Database Amazon Aurora PostgreSQL in a multi-AZ setup, with the writer instance in the availability zone of the primary site.
* Default user password hashing with PBKDF2(SHA512) 210,000 hash iterations which is the default https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pbkdf2[as recommended by OWASP].
* Client credential grants don't use refresh tokens (which is the default).
* Database seeded with 20,000 users and 20,000 clients.
* Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
* All sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.
</@tmpl.guide>