<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>

<@tmpl.guide
title="Concepts for sizing CPU and memory resources"
summary="Understand these concepts to avoid resource exhaustion and congestion"
preview="true"
tileVisible="false" >

Use this as a starting point to size a product environment. 
Adjust the values for your environment as needed based on your load tests.

== Performance recommendations

[WARNING]
====
* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations).

* Increased cache sizes can improve the performance when {project_name} instances run for a longer time. Still, those caches need to be filled when an instance is restarted.

* Use these values as a starting point and perform your own load tests before going into production.
====

Summary:

* The used CPU scales linearly with the number of requests up to the tested limit below.
* The used memory scales linearly with the number of active sessions up to the tested limit below.

Recommendations:

* The base memory usage for an inactive Pod is 1 GB of RAM.

* Leave 1 GB extra head-room for spikes of RAM.

* For each 100,000 active user sessions, add 500 MB per Pod in a three-node cluster (tested with up to 200,000 sessions).
+
This assumes that each user connects to only one client.
Memory requirements increase with the number of client sessions per user session (not tested yet).

* For each 40 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
+
{project_name} spends most of the CPU time hashing the password provided by the user.

* For each 450 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second).
+
Most CPU time goes into creating new TLS connections, as each client runs only a single request.

* For each 350 refresh token requests per second, 1 vCPU per Pod in a three node cluster (tested with up to 435 refresh token requests per second).

* Leave 200% extra head-room for CPU usage to handle spikes in the load.
This ensures a fast startup of the node, and sufficient capacity to handle failover tasks like, for example, re-balancing Infinispan caches, when one node fails.
Performance of {project_name} dropped significantly when its Pods were throttled in our tests.

=== Calculation example

Target size:

* 50,000 active user sessions
* 40 logins per seconds
* 450 client credential grants per second
* 350 refresh token requests per second

Limits calculated:

* CPU requested: 3 vCPU
+
(40 logins per second = 1 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)

* CPU limit: 9 vCPU
+
(Allow for three times the CPU requested to handle peaks, startups and failover tasks, and also refresh token handling which we don't have numbers on, yet)

* Memory requested: 1.25 GB
+
(1 GB base memory plus 250 MB RAM for 50,000 active sessions)

* Memory limit: 2.25 GB
+
(adding 1 GB to the memory requested)

== Reference architecture

The following setup was used to retrieve the settings above to run tests of about 10 minutes for different scenarios:

* OpenShift 4.13.x deployed on AWS via ROSA.
* Machinepool with `m5.4xlarge` instances.
* {project_name} deployed with the Operator and 3 pods.
* Default user password hashing with PBKDF2 27,500 hash iterations.
* Database seeded with 100,000 users and 100,000 clients.
* Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
* All sessions in distributed caches as per default, with two owners per entries, allowing one failing pod without losing data.
* PostgreSQL deployed inside the same OpenShift with ephemeral storage.
+
Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar.

</@tmpl.guide>