Argon2 release notes and sizing guide update

Closes #29033 Signed-off-by: Kamesh Akella <kamesh.asp@gmail.com> Signed-off-by: Alexander Schwartz <aschwart@redhat.com> Co-authored-by: Alexander Schwartz <alexander.schwartz@gmx.net> Co-authored-by: Václav Muzikář <vaclav@muzikari.cz> Co-authored-by: Alexander Schwartz <aschwart@redhat.com>
2024-05-14 11:40:51 -04:00 · 2024-05-14 11:40:51 -04:00 · 1d613d9037
commit 1d613d9037
parent 5cacf8637c
3 changed files with 28 additions and 10 deletions
--- a/docs/documentation/release_notes/topics/25_0_0.adoc
+++ b/docs/documentation/release_notes/topics/25_0_0.adoc
@ -8,7 +8,7 @@ In {project_name} 24, the Welcome page is updated to use https://www.patternfly.

 = Argon2 password hashing

-Argon2 is now the default password hashing algorithm used by {project_name}
+Argon2 is now the default password hashing algorithm used by {project_name} in a non-FIPS environment.

 Argon2 was the winner of the [2015 password hashing competition](https://en.wikipedia.org/wiki/Password_Hashing_Competition)
 and is the recommended hashing algorithm by [OWASP](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#argon2id).
@ -19,6 +19,7 @@ better security, with almost the same CPU time as previous releases of {project_
 memory, which is a requirement to be resistant against GPU attacks. The defaults for Argon2 in {project_name} requires 7MB
 per-hashing request.
 To prevent excessive memory and CPU usage, the parallel computation of hashes by Argon2 is by default limited to the number of cores available to the JVM.
+To support the memory intensive nature of Argon2, we have updated the default GC from ParallelGC to G1GC for a better heap utilization.

 = New Hostname options

--- a/docs/documentation/upgrading/topics/changes/changes-25_0_0.adoc
+++ b/docs/documentation/upgrading/topics/changes/changes-25_0_0.adoc
@ -100,6 +100,23 @@ http_server_requests_seconds_sum{method="GET",outcome="SUCCESS",status="200",uri
 Use the new options `http-metrics-histograms-enabled` and `http-metrics-slos` to enable default histogram buckets or specific buckets for service level objectives (SLOs).
 Read more about histograms in the https://prometheus.io/docs/concepts/metric_types/#histogram[Prometheus documentation about histograms] on how to use the additional metrics series provided in `http_server_requests_seconds_bucket`.

+= Argon2 password hashing
+
+In {project_name} 24 release, we had a change in the password hashing algorithm which resulted in an increased CPU usage. To address that, we opted to a different default hashing algorithm Argon2 for non-FIPS environments which brings the CPU usage back to where it was prior to the {project_name} 24 release.
+
+== Expected improvement in overall CPU usage and temporary increased database activity
+
+The Concepts for sizing CPU and memory resources in the {project_name} High Availability guide have been updated to reflect the new hashing defaults.
+
+After the upgrade, during a password-based login, the user's passwords will be re-hashed with the new hash algorithm and hash iterations as a one-off activity and updated in the database.
+As this clears the user from {project_name}'s internal cache, you'll also see an increased read activity on the database level.
+This increased database activity will decrease over time as more and more user's passwords have been re-hashed.
+
+== Updated JVM garbage collection settings
+
+To support the memory intensive nature of Argon2, we have updated the default GC from ParallelGC to G1GC for a better heap utilization.
+Please monitor the JVM heap utilization closely after this upgrade. Additional tuning may be necessary depending on your specific workload.
+
 = Limiting memory usage when consuming HTTP responses

 In some scenarios like brokering Keycloak uses HTTP to talk to external servers.
--- a/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc
+++ b/docs/guides/high-availability/concepts-memory-and-cpu-sizing.adoc
@ -39,11 +39,11 @@ Memory requirements increase with the number of client sessions per user session
 * In containers, Keycloak allocates 70% of the memory limit for heap based memory. It will also use approximately 300 MB of non-heap-based memory.
 To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7.

-* For each 8 password-based user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
+* For each 45 password-based user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
 +
 {project_name} spends most of the CPU time hashing the password provided by the user, and it is proportional to the number of hash iterations.

-* For each 450 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second).
+* For each 500 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second).
 +
 Most CPU time goes into creating new TLS connections, as each client runs only a single request.

@ -58,17 +58,17 @@ Performance of {project_name} dropped significantly when its Pods were throttled
 Target size:

 * 50,000 active user sessions
-* 24 logins per seconds
-* 450 client credential grants per second
+* 45 logins per seconds
+* 500 client credential grants per second
 * 350 refresh token requests per second

 Limits calculated:

-* CPU requested: 5 vCPU
+* CPU requested: 3 vCPU
 +
-(24 logins per second = 3 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)
+(45 logins per second = 1 vCPU, 500 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)

-* CPU limit: 15 vCPU
+* CPU limit: 9 vCPU
 +
 (Allow for three times the CPU requested to handle peaks, startups and failover tasks)

@ -102,9 +102,9 @@ The following setup was used to retrieve the settings above to run tests of abou
 * {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/passive mode.
 * OpenShift's reverse proxy running in passthrough mode were the TLS connection of the client is terminated at the Pod.
 * Database Amazon Aurora PostgreSQL in a multi-AZ setup, with the writer instance in the availability zone of the primary site.
-* Default user password hashing with PBKDF2(SHA512) 210,000 hash iterations which is the default https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pbkdf2[as recommended by OWASP].
+* Default user password hashing with Argon2 and 5 hash iterations and minimum memory size 7 MiB https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#argon2id[as recommended by OWASP].
 * Client credential grants don't use refresh tokens (which is the default).
-* Database seeded with 20,000 users and 20,000 clients.
+* Database seeded with 100,000 users and 100,000 clients.
 * Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
 * All sessions in distributed caches as per default, with two owners per entries, allowing one failing Pod without losing data.