title="Concepts for sizing CPU and memory resources"
summary="Understand these concepts to avoid resource exhaustion and congestion"
tileVisible="false" >
Use this as a starting point to size a product environment.
Adjust the values for your environment as needed based on your load tests.
== Performance recommendations
[WARNING]
====
* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations).
* Increased cache sizes can improve the performance when {project_name} instances running for a longer time.
This will decrease response times and reduce IOPS on the database.
Still, those caches need to be filled when an instance is restarted, so do not set resources too tight based on the stable state measured once the caches have been filled.
* In containers, Keycloak allocates 70% of the memory limit for heap based memory. It will also use approximately 300 MB of non-heap-based memory.
To calculate the requested memory, use the calculation above. As memory limit, subtract the non-heap memory from the value above and divide the result by 0.7.
* Leave 200% extra head-room for CPU usage to handle spikes in the load.
This ensures a fast startup of the node, and sufficient capacity to handle failover tasks like, for example, re-balancing Infinispan caches, when one node fails.
Performance of {project_name} dropped significantly when its Pods were throttled in our tests.
When using persistent sessions, {project_name} will use less memory as it will keep only a subset of sessions in memory.
At the same time it will use more CPU resources to communicate with the database, and it will use a lot more CPU and write IO on the database to keep the session information up to date.
A performance test showed the following per 100 requests/second that update the database (login, logout, refresh token), tested up to 300 requests per second:
* 0.25 vCPU on each Pod in a 3-node Keycloak cluster.
* 0.25 CPU and 800 WriteIOPS on a Aurora PostgreSQL multi-AZ database base on a `db.t4g.large` instance type.
The average latency of the requests increased by 20-40 ms when running on an Aurora PostgreSQL multi-AZ database with a single reader instance on another AZ.
The latency is expected to be lower when running in a single AZ or non-replicated database.
* {project_name} deployed with the Operator and 3 pods in a high-availability setup with two sites in active/passive mode.
* OpenShift's reverse proxy running in passthrough mode were the TLS connection of the client is terminated at the Pod.
* Database Amazon Aurora PostgreSQL in a multi-AZ setup, with the writer instance in the availability zone of the primary site.
* Default user password hashing with PBKDF2(SHA512) 210,000 hash iterations which is the default https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pbkdf2[as recommended by OWASP].
* Database seeded with 20,000 users and 20,000 clients.
* Infinispan local caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.