diff --git a/docs/documentation/upgrading/topics/changes/changes-24_0_0.adoc b/docs/documentation/upgrading/topics/changes/changes-24_0_0.adoc index 19262ada69..a46bd8b710 100644 --- a/docs/documentation/upgrading/topics/changes/changes-24_0_0.adoc +++ b/docs/documentation/upgrading/topics/changes/changes-24_0_0.adoc @@ -408,8 +408,9 @@ This can be done by configuring the hash iterations explicitly in the password p == Expected increased overall CPU usage and temporary increased database activity The Concepts for sizing CPU and memory resources in the {project_name} High Availability guide have been updated to reflect the new hashing defaults. -While the CPU usage per password-based login in our tests increased by 33% (which includes both the changed password hashing and unchanged TLS connection handling), the overall CPU increase should be around 10% to 15%. -This is due to the averaging effect of {project_name}'s other activities like refreshing access tokens and client credential grants, still this depends on the unique workload of an installation. +The CPU usage per password-based login in our tests increased by the factor of five, which includes both the changed password hashing and unchanged TLS connection handling. +The overall CPU increase should be around the factor of two to three due to the averaging effect of {project_name}'s other activities like refreshing access tokens and client credential grants. +Still, this depends on the unique workload of an installation. After the upgrade, during a password-based login, the user's passwords will be re-hashed with the new hash algorithm and hash iterations as a one-off activity and updated in the database. As this clears the user from {project_name}'s internal cache, you will also see an increased read activity on the database level. diff --git a/docs/documentation/upgrading/topics/changes/changes-24_0_2.adoc b/docs/documentation/upgrading/topics/changes/changes-24_0_2.adoc new file mode 100644 index 0000000000..94ae4d88b1 --- /dev/null +++ b/docs/documentation/upgrading/topics/changes/changes-24_0_2.adoc @@ -0,0 +1,6 @@ +ifeval::[{project_community}==true] += Changes to Password Hashing + +The release notes for {project_name} 24.0.0 have been updated with corrected description of the expected performance impact for the change, as well as sizing guide. + +endif::[] diff --git a/docs/documentation/upgrading/topics/changes/changes.adoc b/docs/documentation/upgrading/topics/changes/changes.adoc index af63f24d54..c7a440ac02 100644 --- a/docs/documentation/upgrading/topics/changes/changes.adoc +++ b/docs/documentation/upgrading/topics/changes/changes.adoc @@ -5,6 +5,10 @@ include::changes-25_0_0.adoc[leveloffset=3] +=== Migrating to 24.0.2 + +include::changes-24_0_2.adoc[leveloffset=3] + === Migrating to 24.0.0 include::changes-24_0_0.adoc[leveloffset=3] diff --git a/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc b/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc index f6ed19aae0..d61e5357bf 100644 --- a/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc +++ b/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc @@ -39,9 +39,9 @@ Memory requirements increase with the number of client sessions per user session * In containers, Keycloak allocates 70% of the memory limit for heap based memory. It will also use approximately 300 MB of non-heap-based memory. To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7. -* For each 30 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second). +* For each 8 password-based user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second). + -{project_name} spends most of the CPU time hashing the password provided by the user. +{project_name} spends most of the CPU time hashing the password provided by the user, and it is proportional to the number of hash iterations. * For each 450 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second). + @@ -58,19 +58,19 @@ Performance of {project_name} dropped significantly when its Pods were throttled Target size: * 50,000 active user sessions -* 30 logins per seconds +* 24 logins per seconds * 450 client credential grants per second * 350 refresh token requests per second Limits calculated: -* CPU requested: 3 vCPU +* CPU requested: 5 vCPU + -(30 logins per second = 1 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU) +(24 logins per second = 3 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU) -* CPU limit: 9 vCPU +* CPU limit: 15 vCPU + -(Allow for three times the CPU requested to handle peaks, startups and failover tasks, and also refresh token handling which we don't have numbers on, yet) +(Allow for three times the CPU requested to handle peaks, startups and failover tasks) * Memory requested: 1250 MB + @@ -86,15 +86,13 @@ The following setup was used to retrieve the settings above to run tests of abou * OpenShift 4.14.x deployed on AWS via ROSA. * Machinepool with `m5.4xlarge` instances. -* {project_name} deployed with the Operator and 3 pods. -* Default user password hashing with PBKDF2(SHA512) 210,000 hash iterations (which is the default). -* Client credential grants don't use refresh tokens (which is the default). -* Database seeded with 100,000 users and 100,000 clients. -* Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database. -* All sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data. +* {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/passive mode. * OpenShift's reverse proxy running in passthrough mode were the TLS connection of the client is terminated at the Pod. -* PostgreSQL deployed inside the same OpenShift with ephemeral storage. -+ -Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar. +* Database Amazon Aurora PostgreSQL in a multi-AZ setup, with the writer instance in the availability zone of the primary site. +* Default user password hashing with PBKDF2(SHA512) 210,000 hash iterations which is the default https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pbkdf2[as recommended by OWASP]. +* Client credential grants don't use refresh tokens (which is the default). +* Database seeded with 20,000 users and 20,000 clients. +* Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database. +* All sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.