keycloak-scim/docs/guides/high-availability/introduction.adoc
Alexander Schwartz 8e4c67bd3f
Document supported configurations and limitations for multi-site
Closes #33384

Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
2024-10-02 14:57:07 +02:00

141 lines
5.2 KiB
Text

<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<#import "/templates/profile.adoc" as profile>
<@tmpl.guide
title="Multi-site deployments"
summary="Connect multiple {project_name} deployments in different sites to increase the overall availability" >
{project_name} supports deployments that consist of multiple {project_name} instances that connect to each other using its Infinispan caches; load balancers can distribute the load evenly across those instances.
Those setups are intended for a transparent network on a single site.
The {project_name} high-availability guide goes one step further to describe setups across multiple sites.
While this setup adds additional complexity, that extra amount of high availability may be needed for some environments.
== When to use a multi-site setup
The multi-site deployment capabilities of {project_name} are targeted at use cases that:
* Are constrained to a single
<@profile.ifProduct>
AWS Region.
</@profile.ifProduct>
<@profile.ifCommunity>
AWS Region or an equivalent low-latency setup.
</@profile.ifCommunity>
* Permit planned outages for maintenance.
* Fit within a defined user and request count.
* Can accept the impact of periodic outages.
<@profile.ifCommunity>
== Tested Configuration
We regularly test {project_name} with the following configuration:
</@profile.ifCommunity>
<@profile.ifProduct>
== Supported Configuration
</@profile.ifProduct>
* Two Openshift single-AZ clusters, in the same AWS Region
** Provisioned with https://www.redhat.com/en/technologies/cloud-computing/openshift/aws[Red Hat OpenShift Service on AWS] (ROSA),
<@profile.ifProduct>
either ROSA HCP or ROSA classic.
</@profile.ifProduct>
<@profile.ifCommunity>
using ROSA HCP.
</@profile.ifCommunity>
** Each Openshift cluster has all its workers in a single Availability Zone.
** OpenShift version
<@profile.ifProduct>
4.16 (or later).
</@profile.ifProduct>
<@profile.ifCommunity>
4.16.
</@profile.ifCommunity>
* Amazon Aurora PostgreSQL database
** High availability with a primary DB instance in one Availability Zone, and a synchronously replicated reader in the second Availability Zone
** Version ${properties["aurora-postgresql.version"]}
* AWS Global Accelerator, sending traffic to both ROSA clusters
* AWS Lambda
<@profile.ifCommunity>
triggered by ROSA's Prometheus and Alert Manager
</@profile.ifCommunity>
to automate failover
<@profile.ifProduct>
Any deviation from the configuration above is not supported and any issue must be replicated in that environment for support.
</@profile.ifProduct>
<@profile.ifCommunity>
While equivalent setups should work, you will need to verify the performance and failure behavior of your environment.
We provide functional tests, failure tests and load tests in the https://github.com/keycloak/keycloak-benchmark[Keycloak Benchmark Project].
</@profile.ifCommunity>
Read more on each item in the <@links.ha id="bblocks-multi-site" /> {section}.
<@profile.ifProduct>
== Maximum load
</@profile.ifProduct>
<@profile.ifCommunity>
== Tested load
We regularly test {project_name} with the following load:
</@profile.ifCommunity>
* 100,000 users
* 300 requests per second
<@profile.ifCommunity>
While we did not see a hard limit in our tests with these values, we ask you to test for higher volumes with horizontally and vertically scaled {project_name} name instances and databases.
</@profile.ifCommunity>
See the <@links.ha id="concepts-memory-and-cpu-sizing" /> {section} for more information.
== Limitations
<@profile.ifCommunity>
Even with the additional redundancy of the two sites, downtimes can still occur:
</@profile.ifCommunity>
* During upgrades of {project_name} or {jdgserver_name} both sites needs to be taken offline for the duration of the upgrade.
* During certain failure scenarios, there may be downtime of up to 5 minutes.
* After certain failure scenarios, manual intervention may be required to restore redundancy by bringing the failed site back online.
* During certain switchover scenarios, there may be downtime of up to 5 minutes.
For more details on limitations see the <@links.ha id="concepts-multi-site" /> {section}.
== Next steps
The different {sections} introduce the necessary concepts and building blocks.
For each building block, a blueprint shows how to set a fully functional example.
Additional performance tuning and security hardening are still recommended when preparing a production setup.
<@profile.ifCommunity>
== Concept and building block overview
* <@links.ha id="concepts-multi-site" />
* <@links.ha id="bblocks-multi-site" />
* <@links.ha id="concepts-database-connections" />
* <@links.ha id="concepts-threads" />
* <@links.ha id="concepts-memory-and-cpu-sizing" />
* <@links.ha id="concepts-infinispan-cli-batch" />
== Blueprints for building blocks
* <@links.ha id="deploy-aurora-multi-az" />
* <@links.ha id="deploy-infinispan-kubernetes-crossdc" />
* <@links.ha id="deploy-keycloak-kubernetes" />
* <@links.ha id="deploy-aws-accelerator-loadbalancer" />
* <@links.ha id="deploy-aws-accelerator-fencing-lambda" />
== Operational procedures
* <@links.ha id="operate-synchronize" />
* <@links.ha id="operate-site-offline" />
* <@links.ha id="operate-site-online" />
* <@links.ha id="health-checks-multi-site" />
</@profile.ifCommunity>
</@tmpl.guide>