<#import "/templates/guide.adoc" as tmpl> <#import "/templates/links.adoc" as links> <@tmpl.guide title="Concepts for sizing CPU and memory resources" summary="Understand these concepts to avoid resource exhaustion and congestion" preview="true" tileVisible="false" > Use this as a starting point to size a product environment. Adjust the values for your environment as needed based on your load tests. == Performance recommendations [WARNING] ==== * Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations). * Increased cache sizes can improve the performance when {project_name} instances run for a longer time. Still, those caches need to be filled when an instance is restarted. * Use these values as a starting point and perform your own load tests before going into production. ==== Summary: * The used CPU scales linearly with the number of requests up to the tested limit below. * The used memory scales linearly with the number of active sessions up to the tested limit below. Recommendations: * The base memory usage for an inactive Pod is 1 GB of RAM. * Leave 1 GB extra head-room for spikes of RAM. * For each 100,000 active user sessions, add 500 MB per Pod in a three-node cluster (tested with up to 200,000 sessions). + This assumes that each user connects to only one client. Memory requirements increase with the number of client sessions per user session (not tested yet). * For each 40 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second). + {project_name} spends most of the CPU time hashing the password provided by the user. * For each 450 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second). + Most CPU time goes into creating new TLS connections, as each client runs only a single request. * For each 350 refresh token requests per second, 1 vCPU per Pod in a three node cluster (tested with up to 435 refresh token requests per second). * Leave 200% extra head-room for CPU usage to handle spikes in the load. This ensures a fast startup of the node, and sufficient capacity to handle failover tasks like, for example, re-balancing Infinispan caches, when one node fails. Performance of {project_name} dropped significantly when its Pods were throttled in our tests. === Calculation example Target size: * 50,000 active user sessions * 40 logins per seconds * 450 client credential grants per second * 350 refresh token requests per second Limits calculated: * CPU requested: 3 vCPU + (40 logins per second = 1 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU) * CPU limit: 9 vCPU + (Allow for three times the CPU requested to handle peaks, startups and failover tasks, and also refresh token handling which we don't have numbers on, yet) * Memory requested: 1.25 GB + (1 GB base memory plus 250 MB RAM for 50,000 active sessions) * Memory limit: 2.25 GB + (adding 1 GB to the memory requested) == Reference architecture The following setup was used to retrieve the settings above to run tests of about 10 minutes for different scenarios: * OpenShift 4.13.x deployed on AWS via ROSA. * Machinepool with `m5.4xlarge` instances. * {project_name} deployed with the Operator and 3 pods. * Default user password hashing with PBKDF2 27,500 hash iterations. * Database seeded with 100,000 users and 100,000 clients. * Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database. * All sessions in distributed caches as per default, with two owners per entries, allowing one failing pod without losing data. * PostgreSQL deployed inside the same OpenShift with ephemeral storage. + Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar.