Adding a Keycloak High Availability section to Keycloak's docs

The content was moved over from the Keycloak Benchmark subproject.

Closes #24844

Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
Co-authored-by: Pedro Ruivo <pruivo@redhat.com>
Co-authored-by: Michal Hajas <mhajas@redhat.com>
Co-authored-by: Kamesh Akella <kakella@redhat.com>
Co-authored-by: Ryan Emerson <remerson@redhat.com>
Co-authored-by: Anna Manukyan <amanukya@redhat.com>
Co-authored-by: Thomas Darimont <thomas.darimont@googlemail.com>
Co-authored-by: Stian Thorgersen <stian@redhat.com>
Co-authored-by: Thomas Darimont <thomas.darimont@googlemail.com>
Co-authored-by: AndyMunro <amunro@redhat.com>
This commit is contained in:
Alexander Schwartz 2023-11-23 13:27:47 +01:00 committed by GitHub
parent da260b386c
commit 834ef79509
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
49 changed files with 5255 additions and 7 deletions

View file

@ -83,3 +83,10 @@ Keycloak supports new password policy, which allows to specify the maximum age o
When this password policy is set to 0, the user will be required to re-authenticate to change the password in the Account Console or by other means.
You can also specify a lower or higher value than the default value of 5 minutes. Thanks to https://github.com/thomasdarimont[Thomas Darimont] for the contribution.
= Preview support for multi-site active-passive deployments
Deploying Keycloak to multiple independent sites is essential for some environments to provide high availability and a speedy recovery from failures.
This release adds preview-support for active-passive deployments for Keycloak.
A lot of work has gone into testing and verifying a setup which can sustain load and recover from the failure scenarios.
To get started, use the high-availability guide which also includes a comprehensive blueprint to deploy a highly available Keycloak to a cloud environment.

View file

@ -19,5 +19,26 @@
<include>pinned-guides</include>
</includes>
</fileSet>
<fileSet>
<directory>${project.basedir}/getting-started</directory>
<outputDirectory>/generated-guides/getting-started/</outputDirectory>
<includes>
<include>pinned-guides</include>
</includes>
</fileSet>
<fileSet>
<directory>${project.basedir}/operator</directory>
<outputDirectory>/generated-guides/operator/</outputDirectory>
<includes>
<include>pinned-guides</include>
</includes>
</fileSet>
<fileSet>
<directory>${project.basedir}/high-availability</directory>
<outputDirectory>/generated-guides/high-availability/</outputDirectory>
<includes>
<include>pinned-guides</include>
</includes>
</fileSet>
</fileSets>
</assembly>

View file

@ -1,3 +1,6 @@
:project_name: Keycloak
:archivebasename: keycloak
:archivedownloadurl: https://github.com/keycloak/keycloak/releases/download/{version}/keycloak-{version}.zip
:section: guide
:sections: guides
:archivedownloadurl: https://github.com/keycloak/keycloak/releases/download/{version}/keycloak-{version}.zip
:jdgserver_name: Infinispan

View file

@ -0,0 +1,68 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Building blocks active-passive deployments"
summary="Overview of building blocks, alternatives and not considered options"
preview="true" >
The following building blocks are needed to set up an active-passive deployment with synchronous replication.
The building blocks link to a blueprint with an example configuration.
They are listed in the order in which they need to be installed.
include::partials/blueprint-disclaimer.adoc[]
== Prerequisites
* Understanding the concepts laid out in the <@links.ha id="concepts-active-passive-sync"/> {section}.
== Two sites with low-latency connection
Ensures that synchronous replication is available for both the database and the external {jdgserver_name}.
*Suggested setup:* Two AWS Availablity Zones within the same AWS Region.
*Not considered:* Two regions on the same or different continents, as it would increase the latency and the likelihood of network failures.
Synchronous replication of databases as a services with Aurora Regional Deployments on AWS is only available within the same region.
== Environment for {project_name} and {jdgserver_name}
Ensures that the instances are deployed and restarted as needed.
*Suggested setup:* Red Hat OpenShift Service on AWS (ROSA) deployed in each availability zone.
*Not considered:* A stretched ROSA cluster which spans multiple availability zones, as this could be a single point of failure if misconfigured.
== Database
A synchronously replicated database across two sites.
*Blueprint:* <@links.ha id="deploy-aurora-multi-az"/>.
== {jdgserver_name}
An {jdgserver_name} deployment which leverages the {jdgserver_name}'s Cross-DC functionality.
*Blueprint:* <@links.ha id="deploy-infinispan-kubernetes-crossdc" /> using the {jdgserver_name} Operator, and connect the two sites using {jdgserver_name}'s Gossip Router.
*Not considered:* Direct interconnections between the Kubernetes clusters on the network layer.
It might be considered in the future.
== {project_name}
A clustered deployment of {project_name} in each site, connected to an external {jdgserver_name}.
*Blueprint:* <@links.ha id="deploy-keycloak-kubernetes" /> together with <@links.ha id="connect-keycloak-to-external-infinispan"/> and the Aurora database.
</@tmpl.guide>
== Loadbalancer
A loadbalancer which checks the `/health/live` URl of the {project_name} deployment in each site.
*Blueprint:* <@links.ha id="deploy-aws-route53-loadbalancer"/>.
*Not considered:* AWS Global Accelerator as it supports only weighted traffic routing and not active-passive failover.
To support active-passive failover, additional logic using, for example, AWS CloudWatch and AWS Lambda would be necessary to simulate the active-passive handling by adjusting the weights when the probes fail.

View file

@ -0,0 +1,174 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for active-passive deployments"
summary="Understanding an active-passive deployment with synchronous replication"
preview="true" >
This topic describes a highly available active/passive setup and the behavior to expect. It outlines the requirements of the high availability active/passive architecture and describes the benefits and tradeoffs.
== When to use this setup
Use this setup to be able to fail over automatically in the event of a site failure, which reduces the likelihood of losing data or sessions. Manual interactions are usually required to restore the redundancy after the failover.
== Deployment, data storage and caching
Two independent {project_name} deployments running in different sites are connected with a low latency network connection.
Users, realms, clients, offline sessions, and other entities are stored in a database that is replicated synchronously across the two sites.
The data is also cached in the {project_name} embedded {jdgserver_name} as local caches.
When the data is changed in one {project_name} instance, that data is updated in the database, and an invalidation message is sent to the other site using the replicated `work` cache.
Session-related data is stored in the replicated caches of the embedded {jdgserver_name} of {project_name}, and forwarded to the external {jdgserver_name}, which forwards information to the external {jdgserver_name} running synchronously in the other site.
As session data of the external {jdgserver_name} is also cached in the embedded {jdgserver_name}, invalidation messages of the replicated `work` cache are needed for invalidation.
In the following paragraphs and diagrams, references to deploying {jdgserver_name} apply to the external {jdgserver_name}.
image::high-availability/active-passive-sync.dio.svg[]
== Causes of data and service loss
While this setup aims for high availability, the following situations can still lead to service or data loss:
* Network failures between the sites or failures of components can lead to short service downtimes while those failures are detected.
The service will be restored automatically.
The system is degraded until the failures are detected and the backup cluster is promoted to service requests.
* Once failures occur in the communication between the sites, manual steps are necessary to re-synchronize a degraded setup.
* Degraded setups can lead to service or data loss if additional components fail.
Monitoring is necessary to detect degraded setups.
== Failures which this setup can survive
[%autowidth]
|===
| Failure | Recovery | RPO^1^ | RTO^2^
| Database node
| If the writer instance fails, the database can promote a reader instance in the same or other site to be the new writer.
| No data loss
| Seconds to minutes (depending on the database)
| {project_name} node
| Multiple {project_name} instances run in each site. If one instance fails, it takes a few seconds for the other nodes to notice the change, and some incoming requests might receive an error message or are delayed for some seconds.
| No data loss
| Less than one minute
| {jdgserver_name} node
| Multiple {jdgserver_name} instances run in each site. If one instance fails, it takes a few seconds for the other nodes to notice the change. Sessions are stored in at least two {jdgserver_name} nodes, so a single node failure does not lead to data loss.
| No data loss
| Less than one minute
| {jdgserver_name} cluster failure
| If the {jdgserver_name} cluster fails in the active site, {project_name} will not be able to communicate with the external {jdgserver_name}, and the {project_name} service will be unavailable.
Manual switchover to the secondary site is recommended.
Future versions will detect this situation and do an automatic failover.
When the {jdgserver_name} cluster is restored, its data will be out-of-sync with {project_name}.
Manual operations are required to get {jdgserver_name} in the primary site in sync with the secondary site.
| Loss of service
| Human intervention required
| Connectivity {jdgserver_name}
| If the connectivity between the two sites is lost, session information cannot be sent to the other site.
Incoming requests might receive an error message or are delayed for some seconds.
The primary site marks the secondary site offline, and will stop sending data to the secondary.
The setup is degraded until the connection is restored and the session data is re-synchronized to the secondary site.
| No data loss^3^
| Less than one minute
| Connectivity database
| If the connectivity between the two sites is lost, the synchronous replication will fail, and it might take some time for the primary site to mark the secondary offline.
Some requests might receive an error message or be delayed for a few seconds.
Manual operations might be necessary depending on the database.
| No data loss^3^
| Seconds to minutes (depending on the database)
| Primary site
| If none of the {project_name} nodes are available, the loadbalancer will detect the outage and redirect the traffic to the secondary site.
Some requests might receive an error message while the loadbalancer has not detected the primary site failure.
The setup will be degraded until the primary site is back up and the session state has been manually synchronized from the secondary to the primary site.
| No data loss^3^
| Less than one minute
| Secondary site
| If the secondary site is not available, it will take a moment for the primary {jdgserver_name} and database to mark the secondary site offline.
Some requests might receive an error message while the detection takes place.
Once the secondary site is up again, the session state needs to be manually synced from the primary site to the secondary site.
| No data loss^3^
| Less than one minute
|===
.Table footnotes:
^1^ Recovery point objective, assuming all parts of the setup were healthy at the time this occurred. +
^2^ Recovery time objective. +
^3^ Manual operations needed to restore the degraded setup.
The statement "`No data loss`" depends on the setup not being degraded from previous failures, which includes completing any pending manual operations to resynchronize the state between the sites.
== Known limitations
Upgrades::
* On {project_name} or {jdgserver_name} version upgrades (major, minor and patch), all session data (except offline session) will be lost as neither supports zero downtime upgrades.
Failovers::
* A successful failover requires a setup not degraded from previous failures.
All manual operations like a re-synchronization after a previous failure must be complete to prevent data loss.
Use monitoring to ensure degradations are detected and handled in a timely manner.
Switchovers::
* A successful switchover requires a setup not degraded from previous failures.
All manual operations like a re-synchronization after a previous failure must be complete to prevent data loss.
Use monitoring to ensure degradations are detected and handled in a timely manner.
Out-of-sync sites::
* The sites can become out of sync when a synchronous {jdgserver_name} request fails.
This situation is currently difficult to monitor, and it would need a full manual re-sync of {jdgserver_name} to recover.
Monitoring the number of cache entries in both sites and the {project_name} log file can show when resynch would become necessary.
Manual operations::
* Manual operations that re-synchronize the {jdgserver_name} state between the sites will issue a full state transfer which will put a stress on the system (network, CPU, Java heap in {jdgserver_name} and {project_name}).
== Questions and answers
Why synchronous database replication?::
A synchronously replicated database ensures that data written in the primary site is always available in the secondary site on failover and no data is lost.
Why synchronous {jdgserver_name} replication?::
A synchronously replicated {jdgserver_name} ensures that sessions created, updated and deleted in the primary site are always available in the secondary site on failover and no data is lost.
Why is a low-latency network between sites needed?::
Synchronous replication defers the response to the caller until the data is received at the secondary site.
For synchronous database replication and synchronous {jdgserver_name} replication, a low latency is necessary as each request can have potentially multiple interactions between the sites when data is updated which would amplify the latency.
Why active-passive?::
Some databases support a single writer instance with a reader instance which is then promoted to be the new writer once the original writer fails.
In such a setup, it is beneficial for the latency to have the writer instance in the same site as the currently active {project_name}.
Synchronous {jdgserver_name} replication can lead to deadlocks when entries in both sites are modified concurrently.
Is this setup limited to two sites?::
This setup could be extended to multiple sites, and there are no fundamental changes necessary to have, for example, three sites.
Once more sites are added, the overall latency between the sites increases, and the likeliness of network failures, and therefore short downtimes, increases as well.
Therefore, such a deployment is expected to have worse performance and an inferior.
For now, it has been tested and documented with blueprints only for two sites.
Is a synchronous cluster less stable than an asynchronous cluster?::
An asynchronous setup would handle network failures between the sites gracefully, while the synchronous setup would delay requests and will throw errors to the caller where the asynchronous setup would have deferred the writes to {jdgserver_name} or the database to the secondary site.
However, as the secondary site would never be fully up-to-date with the primary site, this setup could lead to data loss during failover.
This would include:
+
--
* Lost logouts, meaning sessions are logged in the secondary site although they are logged out in to the primary site at the point of failover when using an asynchronous {jdgserver_name} replication of sessions.
* Lost changes leading to users being able to log in with an old password because database changes are not replicated to the secondary site at the point of failover when using an asynchronous database.
* Invalid caches leading to users being able to log in with an old password because invalidating caches are not propagated at the point of failover to the secondary site when using an asynchronous {jdgserver_name} replication.
--
+
Therefore, tradeoffs exist between high availability and consistency. The focus of this topic is to prioritize consistency over availability with {project_name}.
== Next steps
Continue reading in the <@links.ha id="bblocks-active-passive-sync" /> {section} to find blueprints for the different building blocks.
</@tmpl.guide>

View file

@ -0,0 +1,22 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for database connection pools"
summary="Understand these concepts to avoid resource exhaustion and congestion"
preview="true"
tileVisible="false" >
This section is intended when you want to understand considerations and best practices on how to configure database connection pools for {project_name}.
For a configuration where this is applied, visit <@links.ha id="deploy-keycloak-kubernetes" />.
== Concepts
Creating new database connections is expensive as it takes time.
Creating them when a request arrives will delay the response, so it is good to have them created before the request arrives.
It can also contribute to a https://en.wikipedia.org/wiki/Cache_stampede[stampede effect] where creating a lot of connections in a short time makes things worse as it slows down the system and blocks threads.
Closing a connection also invalidates all server side statements caching for that connection.
include::partials/database-connections/configure-db-connection-pool-best-practices.adoc[]
</@tmpl.guide>

View file

@ -0,0 +1,96 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for sizing CPU and memory resources"
summary="Understand these concepts to avoid resource exhaustion and congestion"
preview="true"
tileVisible="false" >
Use this as a starting point to size a product environment.
Adjust the values for your environment as needed based on your load tests.
== Performance recommendations
[WARNING]
====
* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations).
* Increased cache sizes can improve the performance when {project_name} instances run for a longer time. Still, those caches need to be filled when an instance is restarted.
* Use these values as a starting point and perform your own load tests before going into production.
====
Summary:
* The used CPU scales linearly with the number of requests up to the tested limit below.
* The used memory scales linearly with the number of active sessions up to the tested limit below.
Recommendations:
* The base memory usage for an inactive Pod is 1 GB of RAM.
* Leave 1 GB extra head-room for spikes of RAM.
* For each 100,000 active user sessions, add 500 MB per Pod in a three-node cluster (tested with up to 200,000 sessions).
+
This assumes that each user connects to only one client.
Memory requirements increase with the number of client sessions per user session (not tested yet).
* For each 40 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
+
{project_name} spends most of the CPU time hashing the password provided by the user.
* For each 450 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second).
+
Most CPU time goes into creating new TLS connections, as each client runs only a single request.
* For each 350 refresh token requests per second, 1 vCPU per Pod in a three node cluster (tested with up to 435 refresh token requests per second).
* Leave 200% extra head-room for CPU usage to handle spikes in the load.
This ensures a fast startup of the node, and sufficient capacity to handle failover tasks like, for example, re-balancing Infinispan caches, when one node fails.
Performance of {project_name} dropped significantly when its Pods were throttled in our tests.
=== Calculation example
Target size:
* 50,000 active user sessions
* 40 logins per seconds
* 450 client credential grants per second
* 350 refresh token requests per second
Limits calculated:
* CPU requested: 3 vCPU
+
(40 logins per second = 1 vCPU, 450 client credential grants per second = 1 vCPU, 350 refresh token = 1 vCPU)
* CPU limit: 9 vCPU
+
(Allow for three times the CPU requested to handle peaks, startups and failover tasks, and also refresh token handling which we don't have numbers on, yet)
* Memory requested: 1.25 GB
+
(1 GB base memory plus 250 MB RAM for 50,000 active sessions)
* Memory limit: 2.25 GB
+
(adding 1 GB to the memory requested)
== Reference architecture
The following setup was used to retrieve the settings above to run tests of about 10 minutes for different scenarios:
* OpenShift 4.13.x deployed on AWS via ROSA.
* Machinepool with `m5.4xlarge` instances.
* {project_name} deployed with the Operator and 3 pods.
* Default user password hashing with PBKDF2 27,500 hash iterations.
* Database seeded with 100,000 users and 100,000 clients.
* Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
* All sessions in distributed caches as per default, with two owners per entries, allowing one failing pod without losing data.
* PostgreSQL deployed inside the same OpenShift with ephemeral storage.
+
Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar.
</@tmpl.guide>

View file

@ -0,0 +1,69 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Concepts for configuring thread pools"
summary="Understand these concepts to avoid resource exhaustion and congestion"
preview="true"
tileVisible="false" >
This section is intended when you want to understand the considerations and best practices on how to configure thread pools connection pools for {project_name}.
For a configuration where this is applied, visit <@links.ha id="deploy-keycloak-kubernetes" />.
== Concepts
=== Quarkus executor pool
{project_name} requests, as well as all probes, are handled by the Quarkus executor pool.
The Quarkus executor thread pool is configured in https://quarkus.io/guides/all-config#quarkus-core_quarkus.thread-pool.max-threads[`quarkus.thread-pool.max-threads`] and has a maximum size of at least 200 threads.
Depending on the available CPU cores, it can grow even larger.
Threads are created as needed, and will end when no longer needed, so the system will scale up and down as needed.
When the load and the number of threads increases, the bottleneck will usually be the database connections.
Once a request cannot acquire a database connection, it will fail with a message in the log like `Unable to acquire JDBC Connection`.
The caller will receive a response with a 5xx HTTP status code indicating a server side error.
With the number of threads in the executor pool being an order of magnitude larger than the number of database connections and with requests failing when no database connection is available within the https://quarkus.io/guides/all-config#quarkus-agroal_quarkus.datasource.jdbc.acquisition-timeout[`quarkus.datasource.jdbc.acquisition-timeout`] (5 seconds default), this is somewhat of a https://en.wikipedia.org/wiki/Demand_response#Load_shedding[load-shedding behavior] where it returns an error response instead of queueing requests for an indefinite amount of time.
=== JGroups connection pool
The combined number of executor threads in all {project_name} nodes in the cluster should not exceed the number of threads available in JGroups thread pool to avoid the error `org.jgroups.util.ThreadPool: thread pool is full`.
To see the error the first time it happens, the system property `jgroups.thread_dumps_threshold` needs to be set to `1`, as otherwise the message appears only after 10000 threads have been rejected.
--
include::partials/threads/executor-jgroups-thread-calculation.adoc[]
--
Use the metrics `vendor_jgroups_tcp_get_thread_pool_size` to monitor the total JGroup threads in the pool and `vendor_jgroups_tcp_get_thread_pool_size_active` for the threads active in the pool.
This is useful to monitor that limiting the Quarkus thread pool size keeps the number of active JGroup threads below the maximum JGroup thread pool size.
[#load-shedding]
=== Load Shedding
By default, {project_name} will queue all incoming requests infinitely, even if the request processing stalls.
This will use additional memory in the Pod, can exhaust resources in the load balancers, and the requests will eventually time out on the client side without the client knowing if the request has been processed.
To limit the number of queued requests in {project_name}, set an additional Quarkus configuration option.
Configure `quarkus.thread-pool.queue-size` to specify a maximum queue length to allow for effective load shedding once this queue size is exceeded: {project_name} will return HTTP Status code 500 (server error).
Assuming a {project_name} Pod processes around 200 requests per second, a queue of 1000 would lead to maximum waiting times of around 5 seconds.
// KC22.0.6 - this is still 500
When this setting is active, requests that exceed the number of queued requests will return with an HTTP 503 error.
{project_name} logs the error message in its log.
[#probes]
=== Probes
All health probes are handled in the Quarkus executor worker pool by default.
Only the liveness probe is non-blocking.
Future version of {project_name} and Quarkus plan to have other probes also being non-blocking.
=== OS Resources
In order for Java to create threads, when running on Linux it needs to have file handles available.
Therefore, the number of open files (as retrieved as `ulimit -n` on Linux) need to provide head-space for {project_name} to increase the number of threads needed.
Each thread will also consume memory, and the container memory limits need to be set to a value that allows for this or the Pod will be killed by Kubernetes.
</@tmpl.guide>

View file

@ -0,0 +1,79 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Connect {project_name} with an external {jdgserver_name}"
summary="Building block for an Infinispan deployment on Kubernetes"
preview="true"
tileVisible="false" >
This topic describes advanced {jdgserver_name} configurations for {project_name} on Kubernetes.
== Prerequisites
* <@links.ha id="deploy-keycloak-kubernetes" /> as it will be extended.
* <@links.ha id="deploy-infinispan-kubernetes-crossdc" />.
== Procedure
. Prepare an {jdgserver_name} Cache configuration XML from the file `cache-ispn.xml` which is part of the {project_name} distribution:
.. For each `distributed-cache` entry, add the tags `<persistence />` as shown following.
+
[source,xml,indent=0]
----
include::examples/src/kcb-infinispan-cache-remote-store-config.xml[tag=keycloak-ispn-remotestore]
----
<1> New tag `<persistence />` to connect it to the remote store.
<2> For the address to the remote store, reference two environment variables for host name and port number.
<3> For authentication, reference two environment variables for username and password.
<4> To secure the remote store connection, use the Kubernetes mechanisms of the pre-configured truststore.
.. Prepare an {jdgserver_name} Cache configuration XML from the file `cache-ispn.xml`, which is part of the {project_name} distribution.
For each `replicated-cache` entry, add the tag `<persistence />` as shown below.
For additional information on the infinispan configuration options, see the https://docs.jboss.org/infinispan/14.0/configdocs/infinispan-config-14.0.html[infinispan configuration schema reference].
+
[source,xml,indent=0]
----
include::examples/src/kcb-infinispan-cache-remote-store-config.xml[tag=keycloak-ispn-remotestore-work]
----
. Place the {jdgserver_name} Cache configuration XML in a ConfigMap.
+
[source,yaml]
----
include::examples/generated/keycloak-ispn.yaml[tag=keycloak-ispn-configmap]
...
----
. Create a Secret with the username and password to connect to the external {jdgserver_name} deployment:
+
[source,yaml]
----
include::examples/generated/keycloak-ispn.yaml[tag=keycloak-ispn-secret]
----
. Extend the {project_name} Custom Resource with `additionalOptions` and extend the `podTemplate` as shown below.
+
[NOTE]
====
* The new `additionalOptions` entries starting with `remote-store` used here are not official {project_name} configurations.
Instead, they provide their values to environment variables that are then referenced in the {jdgserver_name} XML configuration.
* All the memory, resource and database configurations are skipped from the CR below as they have been described in <@links.ha id="deploy-keycloak-kubernetes" /> {section} already.
Administrators should leave those configurations untouched.
====
+
[source,yaml]
----
include::examples/generated/keycloak-ispn.yaml[tag=keycloak-ispn]
----
<1> Custom cache configuration XML file definition, which includes configuration for remote or embedded {jdgserver_name} store.
<2> The hostname and port of the remote cache {jdgserver_name} cluster.
<3> The credentials required, username and password, to access the remote cache {jdgserver_name} cluster.
<4> `jboss.site.name` is an arbitrary {jdgserver_name} site name which {project_name} needs for its embedded {jdgserver_name} deployment when a remote store is used.
This site name is related only to the embedded {jdgserver_name} and does not need to match any value from the external {jdgserver_name} deployment.
<5> Mounting the cache configuration Volume in Kubernetes.
However, matching the `jboss.site.name` with the external {jdgserver_name} deployment site name helps debugging possible future issues.
If you are using multiple sites for {project_name} in a cross-DC setup such as <@links.ha id="deploy-infinispan-kubernetes-crossdc" />, the site name must be different in each site.
<6> Defining the cache configuration Volume using the already created ConfigMap in Kubernetes.
</@tmpl.guide>

View file

@ -0,0 +1,63 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploy AWS Aurora in multiple availability zones"
summary="Building block for a database"
preview="true"
tileVisible="false" >
This topic describes how to deploy an Aurora regional deployment of a PostgreSQL instance across multiple availability zones to tolerate one or more availability zone failures in a given AWS region.
This deployment is intended to be used with the setup described in the <@links.ha id="concepts-active-passive-sync"/> {section}.
Use this deployment with the other building blocks outlined in the <@links.ha id="bblocks-active-passive-sync"/> {section}.
include::partials/blueprint-disclaimer.adoc[]
== Architecture
Aurora database clusters consist of multiple Aurora database instances, with one instance designated as the primary writer and all others as backup readers.
To ensure high availability in the event of availability zone failures, Aurora allows database instances to be deployed across multiple zones in a single AWS region.
In the event of a failure on the availability zone that is hosting the Primary database instance, Aurora automatically heals itself and promotes a reader instance from a non-failed availability zone to be the new writer instance.
.Aurora Multiple Availability Zone Deployment
image::high-availability/aurora-multi-az.dio.svg[]
See the https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html[AWS Aurora documentation] for more details on the semantics provided by Aurora databases.
This documentation follows AWS best practices and creates a private Aurora database that is not exposed to the Internet.
To access the database from a ROSA cluster, <<establish-peering-connections-with-rosa-clusters,establish a peering connection between the database and the ROSA cluster>>.
== Procedure
The following procedure contains two sections:
* Creation of an Aurora Multi-AZ database cluster with the name "keycloak-aurora" in eu-west-1.
* Creation of a peering connection between the ROSA cluster(s) and the Aurora VPC to allow applications deployed on the ROSA clusters to establish connections with the database.
=== Create Aurora database Cluster
include::partials/aurora/aurora-multiaz-create-procedure.adoc[]
[#establish-peering-connections-with-rosa-clusters]
=== Establish Peering Connections with ROSA clusters
Perform these steps once for each ROSA cluster that contains a {project_name} deployment.
include::partials/aurora/aurora-create-peering-connections.adoc[]
== Verifying the connection
include::partials/aurora/aurora-verify-peering-connections.adoc[]
== Deploying {project_name}
Now that an Aurora database has been established and linked with all of your ROSA clusters, the next step is to deploy {project_name} as described in the <@links.ha id="deploy-keycloak-kubernetes" /> {section} with the JDBC url configured to use the Aurora database writer endpoint.
To do this, create a `{project_name}` CR with the following adjustments:
. Update `spec.db.url` to be `jdbc:postgresql://$HOST:5432/keycloak` where `$HOST` is the
<<aurora-writer-url, Aurora writer endpoint URL>>.
. Ensure that the Secrets referenced by `spec.db.usernameSecret` and `spec.db.passwordSecret` contain usernames and passwords defined when creating Aurora.
</@tmpl.guide>

View file

@ -0,0 +1,292 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploy an AWS Route 53 loadbalancer"
summary="Building block for a loadbalancer"
preview="true"
tileVisible="false" >
This topic describes the procedure required to configure DNS based failover for Multi-AZ {project_name} clusters using AWS Route53 for an active/passive setup. These instructions are intended for used with the setup described in the <@links.ha id="concepts-active-passive-sync"/> {section}.
Use it together with the other building blocks outlined in the <@links.ha id="bblocks-active-passive-sync"/> {section}.
include::partials/blueprint-disclaimer.adoc[]
== Architecture
All {project_name} client requests are routed by a DNS name managed by Route53 records.
Route53 is responsibile to ensure that all client requests are routed to the Primary cluster when it is available and healthy, or to the backup cluster in the event of the primary availability-zone or {project_name} deployment failing.
If the primary site fails, the DNS changes will need to propagate to the clients.
Depending on the client's settings, the propagation may take some minutes based on the client's configuration.
When using mobile connections, some internet providers might not respect the TTL of the DNS entries, which can lead to an extended time before the clients can connect to the new site.
.AWS Global Accelerator Failover
image::high-availability/route53-multi-az-failover.svg[]
Two Openshift Routes are exposed on both the Primary and Backup ROSA cluster.
The first Route uses the Route53 DNS name to service client requests, whereas the second Route is used by Route53 to monitor the health of the {project_name} cluster.
== Prerequisites
* Deployment of {project_name} as described in <@links.ha id="deploy-keycloak-kubernetes" /> on a ROSA cluster in two AWS availability zones in AWS one region
* An owned domain for client requests to be routed through
== Procedure
. [[create-hosted-zone]]Create a https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/CreatingHostedZone.html[Route53 Hosted Zone] using the root domain name through which you want all {project_name} clients to connect.
+
Take note of the "Hosted zone ID", because this ID is required in later steps.
. Retrieve the "Hosted zone ID" and DNS name associated with each ROSA cluster.
+
For both the Primary and Backup cluster, perform the following steps:
+
.. Log in to the ROSA cluster.
+
.. Obtain the cluster VPC ID.
+
.Command:
[source,bash]
----
<#noparse>
NODE=$(kubectl get nodes --selector=node-role.kubernetes.io/worker \
-o jsonpath='{.items[0].metadata.name}'
)
aws ec2 describe-instances \
--filters "Name=private-dns-name,Values=${NODE}" \
--query 'Reservations[*].Instances[*].VpcId' \
--region eu-west-1 \#<1>
--output text
</#noparse>
----
<1> The AWS region hosting your ROSA cluster
+
.Output:
[source]
----
vpc-08572eedcb77c9f87
----
+
.. [[hosted_zone_id]]Retrieve the cluster LoadBalancer Hosted Zone ID and DNS hostname
+
.Command:
[source,bash]
----
aws elb describe-load-balancers \
--query "LoadBalancerDescriptions[?VPCId=='vpc-08572eedcb77c9f87'].{CanonicalHostedZoneNameID:CanonicalHostedZoneNameID,DNSName:DNSName}" \#<1>
--region eu-west-1 \
--output json
----
<1> Utilise the VPC ID retrieved in the previous step
+
.Output:
[source,json]
----
[
{
"CanonicalHostedZoneNameID": "Z32O12XQLNTSW2", #<1>
"DNSName": "ab50395cd04304a539af5b8854325e22-773464857.eu-west-1.elb.amazonaws.com"
}
]
----
+
. Create Route53 health checks
+
.Command:
[source,bash]
----
<#noparse>
function createHealthCheck() {
# Creating a hash of the caller reference to allow for names longer than 64 characters
REF=($(echo $1 | sha1sum ))
aws route53 create-health-check \
--caller-reference "$REF" \
--query "HealthCheck.Id" \
--no-cli-pager \
--output text \
--health-check-config '
{
"Type": "HTTPS",
"ResourcePath": "/health/live",
"FullyQualifiedDomainName": "'$1'",
"Port": 443,
"RequestInterval": 30,
"FailureThreshold": 1,
"EnableSNI": true
}
'
}
CLIENT_DOMAIN="client.keycloak-benchmark.com" #<1>
PRIMARY_DOMAIN="primary.${CLIENT_DOMAIN}" #<2>
BACKUP_DOMAIN="backup.${CLIENT_DOMAIN}" #<3>
createHealthCheck ${PRIMARY_DOMAIN}
createHealthCheck ${BACKUP_DOMAIN}
</#noparse>
----
<1> The domain which {project_name} clients should connect to.
This should be the same, or a subdomain, of the root domain used to create the xref:create-hosted-zone[Hosted Zone].
<2> The subdomain that will be used for health probes on the Primary cluster
<3> The subdomain that will be used for health probes on the Backup cluster
+
.Output:
[source,bash]
----
233e180f-f023-45a3-954e-415303f21eab #<1>
799e2cbb-43ae-4848-9b72-0d9173f04912 #<2>
----
<1> The ID of the Primary Health check
<2> The ID of the Backup Health check
+
. Create the Route53 record set
+
.Command:
[source,bash]
----
<#noparse>
HOSTED_ZONE_ID="Z09084361B6LKQQRCVBEY" #<1>
PRIMARY_LB_HOSTED_ZONE_ID="Z32O12XQLNTSW2"
PRIMARY_LB_DNS=ab50395cd04304a539af5b8854325e22-773464857.eu-west-1.elb.amazonaws.com
PRIMARY_HEALTH_ID=233e180f-f023-45a3-954e-415303f21eab
BACKUP_LB_HOSTED_ZONE_ID="Z32O12XQLNTSW2"
BACKUP_LB_DNS=a184a0e02a5d44a9194e517c12c2b0ec-1203036292.eu-west-1.elb.amazonaws.com
BACKUP_HEALTH_ID=799e2cbb-43ae-4848-9b72-0d9173f04912
aws route53 change-resource-record-sets \
--hosted-zone-id Z09084361B6LKQQRCVBEY \
--query "ChangeInfo.Id" \
--output text \
--change-batch '
{
"Comment": "Creating Record Set for '${CLIENT_DOMAIN}'",
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "'${PRIMARY_DOMAIN}'",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "'${PRIMARY_LB_HOSTED_ZONE_ID}'",
"DNSName": "'${PRIMARY_LB_DNS}'",
"EvaluateTargetHealth": true
}
}
}, {
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "'${BACKUP_DOMAIN}'",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "'${BACKUP_LB_HOSTED_ZONE_ID}'",
"DNSName": "'${BACKUP_LB_DNS}'",
"EvaluateTargetHealth": true
}
}
}, {
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "'${CLIENT_DOMAIN}'",
"Type": "A",
"SetIdentifier": "client-failover-primary-'${SUBDOMAIN}'",
"Failover": "PRIMARY",
"HealthCheckId": "'${PRIMARY_HEALTH_ID}'",
"AliasTarget": {
"HostedZoneId": "'${HOSTED_ZONE_ID}'",
"DNSName": "'${PRIMARY_DOMAIN}'",
"EvaluateTargetHealth": true
}
}
}, {
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "'${CLIENT_DOMAIN}'",
"Type": "A",
"SetIdentifier": "client-failover-backup-'${SUBDOMAIN}'",
"Failover": "SECONDARY",
"HealthCheckId": "'${BACKUP_HEALTH_ID}'",
"AliasTarget": {
"HostedZoneId": "'${HOSTED_ZONE_ID}'",
"DNSName": "'${BACKUP_DOMAIN}'",
"EvaluateTargetHealth": true
}
}
}]
}
'
</#noparse>
----
<1> The ID of the xref:create-hosted-zone[Hosted Zone] created earlier
+
.Output:
[source]
----
/change/C053410633T95FR9WN3YI
----
+
. Wait for the Route53 records to be updated
+
.Command:
[source,bash]
----
aws route53 wait resource-record-sets-changed --id /change/C053410633T95FR9WN3YI
----
+
. Update or create the {project_name} deployment
+
For both the Primary and Backup cluster, perform the following steps:
+
.. Log in to the ROSA cluster
+
.. Ensure the {project_name} CR has the following configuration
+
[source,yaml]
----
<#noparse>
apiVersion: k8s.keycloak.org/v2alpha1
kind: {project_name}
metadata:
name: keycloak
spec:
hostname:
hostname: ${CLIENT_DOMAIN} # <1>
</#noparse>
----
<1> The domain clients used to connect to {project_name}
+
To ensure that request forwarding works, edit the {project_name} CR to specify the hostname through which clients will access the {project_name} instances.
This hostname must be the `$CLIENT_DOMAIN` used in the Route53 configuration.
+
.. Create health check Route
+
.Command:
[source,bash]
----
cat <<EOF | kubectl apply -n $NAMESPACE -f - #<1>
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: aws-health-route
spec:
host: $DOMAIN #<2>
port:
targetPort: https
tls:
insecureEdgeTerminationPolicy: Redirect
termination: passthrough
to:
kind: Service
name: keycloak-service
weight: 100
wildcardPolicy: None
EOF
----
<1> `$NAMESPACE` should be replaced with the namespace of your {project_name} deployment
<2> `$DOMAIN` should be replaced with either the `PRIMARY_DOMAIN` or `BACKUP_DOMAIN`, if the current cluster is the Primary of Backup cluster, respectively.
== Verify
Navigate to the chosen CLIENT_DOMAIN in your local browser and log in to the {project_name} console.
To test failover works as expected, log in to the Primary cluster and scale the {project_name} deployment to zero Pods.
Scaling will cause the Primary's health checks to fail and Route53 should start routing traffic to the {project_name} Pods on the Backup cluster.
</@tmpl.guide>

View file

@ -0,0 +1,207 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploy {jdgserver_name} for HA with the {jdgserver_name} Operator"
summary="Building block for an {jdgserver_name} deployment on Kubernetes"
preview="true"
tileVisible="false" >
include::partials/infinispan/infinispan-attributes.adoc[]
This {section} describes the procedures required to deploy {jdgserver_name} in a multiple-cluster environment (cross-site).
For simplicity, this topic uses the minimum configuration possible that allows {project_name} to be used with an external {jdgserver_name}.
This {section} assumes two {ocp} clusters named `{site-a}` and `{site-b}`.
This is a building block following the concepts described in the <@links.ha id="concepts-active-passive-sync" /> {section}.
See the <@links.ha id="introduction" /> {section} for an overview.
== Architecture
This setup deploys two synchronously replicating {jdgserver_name} clusters in two sites with a low-latency network connection.
An example of this scenario could be two availability zones in one AWS region.
{project_name}, loadbalancer and database have been removed from the following diagram for simplicity.
image::high-availability/infinispan-crossdc-az.dio.svg[]
== Prerequisites
include::partials/infinispan/infinispan-prerequisites.adoc[]
== Procedure
include::partials/infinispan/infinispan-install-operator.adoc[]
include::partials/infinispan/infinispan-credentials.adoc[]
+
These commands must be executed on both {ocp} clusters.
. Create a service account.
+
A service account is required to establish a connection between clusters.
The {ispn-operator} uses it to inspect the network configuration from the remote site and to configure the local {jdgserver_name} cluster accordingly.
+
For more details, see the {operator-docs}#managed-cross-site-connections_cross-site[Managing Cross-Site Connections] documentation.
+
.. First, create the service account and generate an access token in both {ocp} clusters.
+
.Create the service account in `{site-a}`
[source,bash,subs="+attributes"]
----
kubectl create sa -n {ns} {sa}
kubectl policy add-role-to-user view -n {ns} -z {sa}
kubectl create token -n {ns} {sa} > {site-a}-token.txt
----
+
.Create the service account in `{site-b}`
[source,bash,subs="+attributes"]
----
kubectl create sa -n {ns} {sa}
kubectl policy add-role-to-user view -n {ns} -z {sa}
kubectl create token -n {ns} {sa} > {site-b}-token.txt
----
+
.. The next step is to deploy the token from `{site-a}` into `{site-b}` and the reverse:
+
.Deploy `{site-b}` token into `{site-a}`
[source,bash,subs="+attributes"]
----
kubectl create secret generic -n {ns} {sa-secret} \
--from-literal=token="$(cat {site-b}-token.txt)"
----
+
.Deploy `{site-a}` token into `{site-b}`
[source,bash,subs="+attributes"]
----
kubectl create secret generic -n {ns} {sa-secret} \
--from-literal=token="$(cat {site-a}-token.txt)"
----
. Create TLS secrets
+
In this {section}, {jdgserver_name} uses an {ocp} Route for the cross-site communication.
It uses the SNI extension of TLS to direct the traffic to the correct Pods.
To achieve that, JGroups use TLS sockets, which require a Keystore and Truststore with the correct certificates.
+
For more information, see the {operator-docs}#securing-cross-site-connections_cross-site[Securing Cross Site Connections] documentation or this https://developers.redhat.com/learn/openshift/cross-site-and-cross-applications-red-hat-openshift-and-red-hat-data-grid[Red Hat Developer Guide].
+
Upload the Keystore and the Truststore in an {ocp} Secret.
The secret contains the file content, the password to access it, and the type of the store.
Instructions for creating the certificates and the stores are beyond the scope of this guide.
+
To upload the Keystore as a Secret, use the following command:
+
.Deploy a Keystore
[source,bash,subs="+attributes"]
----
kubectl -n {ns} create secret generic {ks-secret} \
--from-file=keystore.p12="./certs/keystore.p12" \ # <1>
--from-literal=password=secret \ #<2>
--from-literal=type=pkcs12 #<3>
----
<1> The filename and the path to the Keystore.
<2> The password to access the Keystore.
<3> The Keystore type.
+
To upload the Truststore as a Secret, use the following command:
+
.Deploy a Truststore
[source,bash,subs="+attributes"]
----
kubectl -n {ns} create secret generic {ts-secret} \
--from-file=truststore.p12="./certs/truststore.p12" \ # <1>
--from-literal=password=caSecret \ # <2>
--from-literal=type=pkcs12 # <3>
----
<1> The filename and the path to the Truststore.
<2> The password to access the Truststore.
<3> The Truststore type.
+
NOTE: Keystore and Truststore must be uploaded in both {ocp} clusters.
. Create an {jdgserver_name} Cluster with Cross-Site enabled
+
The {operator-docs}#setting-up-xsite[Setting Up Cross-Site] documentation provides all the information on how to create and configure your {jdgserver_name} cluster with cross-site enabled, including the previous steps.
+
A basic example is provided in this {section} using the credentials, tokens, and TLS Keystore/Truststore created by the commands from the previous steps.
+
.The {jdgserver_name} CR for `{site-a}`
[source,yaml]
----
include::examples/generated/ispn-site-a.yaml[tag=infinispan-crossdc]
----
<1> The cluster name
<2> Allows the cluster to be monitored by Prometheus.
<3> If using a custom credential, configure here the secret name.
<4> The name of the local site, in this case `{site-a}`.
<5> Exposing the cross-site connection using {ocp} Route.
<6> The secret name where the Keystore exists as defined in the previous step.
<7> The alias of the certificate inside the Keystore.
<8> The secret key (filename) of the Keystore as defined in the previous step.
<9> The secret name where the Truststore exists as defined in the previous step.
<10> The Truststore key (filename) of the Keystore as defined in the previous step.
<11> The remote site's name, in this case `{site-b}`.
<12> The namespace of the {jdgserver_name} cluster from the remote site.
<13> The {ocp} API URL for the remote site.
<14> The secret with the access toke to authenticate into the remote site.
+
For `{site-b}`, the {jdgserver_name} CR looks similar to the above.
Note the differences in point 4, 11 and 13.
+
.The {jdgserver_name} CR for `{site-b}`
[source,yaml]
----
include::examples/generated/ispn-site-b.yaml[tag=infinispan-crossdc]
----
. Creating the caches for {project_name}.
+
{project_name} requires the following caches to be present: `sessions`, `actionTokens`, `authenticationSessions`, `offlineSessions`, `clientSessions`, `offlineClientSessions`, `loginFailures`, and `work`.
+
The {jdgserver_name} {operator-docs}#creating-caches[Cache CR] allows deploying the caches in the {jdgserver_name} cluster.
Cross-site needs to be enabled per cache as documented by {xsite-docs}[Cross Site Documentation].
The documentation contains more details about the options used by this {section}.
The following example shows the Cache CR for `{site-a}`.
+
.sessions in `{site-a}`
[source,yaml]
----
include::examples/generated/ispn-site-a.yaml[tag=infinispan-cache-sessions]
----
<1> The cross-site merge policy, invoked when there is a write-write conflict.
Set this for the caches `sessions`, `authenticationSessions`, `offlineSessions`, `clientSessions` and `offlineClientSessions`, and do not set it for all other caches.
<2> The remote site name.
<3> The cross-site communication, in this case, `SYNC`.
+
For `{site-b}`, the Cache CR is similar except in point 2.
+
.session in `{site-b}`
[source,yaml]
----
include::examples/generated/ispn-site-b.yaml[tag=infinispan-cache-sessions]
----
[#verifying-the-deployment]
== Verifying the deployment
Confirm that the {jdgserver_name} cluster is formed, and the cross-site connection is established between the {ocp} clusters.
.Wait until the {jdgserver_name} cluster is formed
[source,bash,subs="+attributes"]
----
kubectl wait --for condition=WellFormed --timeout=300s infinispans.infinispan.org -n {ns} {cluster-name}
----
.Wait until the {jdgserver_name} cross-site connection is established
[source,bash,subs="+attributes"]
----
kubectl wait --for condition=CrossSiteViewFormed --timeout=300s infinispans.infinispan.org -n {ns} {cluster-name}
----
== Next steps
After infinispan is deployed and running, use the procedure in the <@links.ha id="connect-keycloak-to-external-infinispan"/> {section} to connect your {project_name} cluster with the {jdgserver_name} cluster.
</@tmpl.guide>

View file

@ -0,0 +1,91 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Deploy {project_name} for HA with the {project_name} Operator"
summary="Building block for a Keycloak deployment"
preview="true"
tileVisible="false" >
This guide describes advanced {project_name} configurations for Kubernetes which are load tested and will recover from single Pod failures.
These instructions are intended for use with the setup described in the <@links.ha id="concepts-active-passive-sync"/> {section}.
Use it together with the other building blocks outlined in the <@links.ha id="bblocks-active-passive-sync"/> {section}.
== Prerequisites
* OpenShift or Kubernetes cluster running.
* Understanding of a <@links.operator id="basic-deployment" /> of {project_name} with the {project_name} Operator.
== Procedure
. Determine the sizing of the deployment using the <@links.ha id="concepts-memory-and-cpu-sizing" /> {section}.
. Install the {project_name} Operator as described in the <@links.operator id="installation" /> {section}.
. Deploy the {project_name} CR with the following values with the resource requests and limits calculated in the first step:
+
[source,yaml]
----
include::examples/generated/keycloak.yaml[tag=keycloak]
----
<1> The database connection pool initial, max and min size should be identical to allow statement caching for the database.
Adjust this number to meet the needs of your system.
As most requests will not touch the database due to the {project_name} embedded cache, this change can server several hundreds of requests per second.
See the <@links.ha id="concepts-database-connections" /> {section} for details.
<2> To be able to analyze the system under load, enable the metrics endpoint.
The disadvantage of the setting is that the metrics will be available at the external {project_name} endpoint, so you must add a filter so that the endpoint is not available from the outside.
Use a reverse proxy in front of {project_name} to filter out those URLs.
<3> The default setting for the internal JGroup thread pools is 200 threads maximum.
The number of all {project_name} threads in the StatefulSet should not exceed the number of JGroup threads to avoid a JGroup thread pool exhaustion which could stall {project_name} request processing.
You might consider limiting the number of {project_name} threads further because multiple concurrent threads will lead to throttling by Kubernetes once the requested CPU limit is reached.
See the <@links.ha id="concepts-threads" /> {section} for details.
<4> The JVM options set additional parameters:
* `jgroups.thread_dumps_threshold` ensures that a log message "`thread pool is full`" appears once the JGroup thread pool is full for the first time.
See the <@links.ha id="concepts-threads" /> {section} for details.
* Adjust the memory settings for the heap.
== Verifying the deployment
Confirm that the {project_name} deployment is ready.
[source,bash]
----
kubectl wait --for=condition=Ready keycloaks.k8s.keycloak.org/keycloak
kubectl wait --for=condition=RollingUpdate=False keycloaks.k8s.keycloak.org/keycloak
----
== Optional: Load shedding
To enable load shedding, limit the number of queued requests.
.Load shedding with Quarkus thread pool size
[source,yaml,indent=0]
----
env:
include::examples/generated/keycloak.yaml[tag=keycloak-queue-size]
----
<1> This change limits the number of queued {project_name} requests.
All exceeding requests are served with an HTTP 503.
See the <@links.ha id="concepts-threads" /> {section} about load shedding for details.
== Optional: Disable sticky sessions
When running on OpenShift and the default passthrough Ingress setup as provided by the {project_name} Operator, the load balancing done by HAProxy is done by using sticky sessions based on the IP address of the source.
When running load tests, or when having a reverse proxy in front of HAProxy, you might want to disable this setup to avoid receiving all requests on a single {project_name} Pod.
Add the following supplementary configuration under the `spec` in the {project_name} Custom Resource to disable sticky sessions.
[source,yaml]
----
spec:
ingress:
enabled: true
annotations:
# When running load tests, disable sticky sessions on the OpenShift HAProxy router
# to avoid receiving all requests on a single {project_name} Pod.
haproxy.router.openshift.io/balance: roundrobin
haproxy.router.openshift.io/disable_cookies: 'true'
----
</@tmpl.guide>

View file

@ -0,0 +1,225 @@
---
# Source: ispn-helm/templates/infinispan.yaml
# There are several callouts in this YAML marked with `# <1>' etc. See 'running/infinispan-deployment.adoc` for the details.# tag::infinispan-credentials[]
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: connect-secret
namespace: keycloak
data:
identities.yaml: Y3JlZGVudGlhbHM6CiAgLSB1c2VybmFtZTogZGV2ZWxvcGVyCiAgICBwYXNzd29yZDogc3Ryb25nLXBhc3N3b3JkCiAgICByb2xlczoKICAgICAgLSBhZG1pbgo= # <1>
# end::infinispan-credentials[]
---
# Source: ispn-helm/templates/infinispan.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-config
namespace: keycloak
data:
infinispan-config.yaml: >
infinispan:
cacheContainer:
metrics:
namesAsTags: true
gauges: true
histograms: false
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-actionTokens[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: actiontokens
namespace: keycloak
spec:
clusterName: infinispan
name: actionTokens
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
# end::infinispan-cache-actionTokens[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-authenticationSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: authenticationsessions
namespace: keycloak
spec:
clusterName: infinispan
name: authenticationSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
# end::infinispan-cache-authenticationSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-clientSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: clientsessions
namespace: keycloak
spec:
clusterName: infinispan
name: clientSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
# end::infinispan-cache-clientSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-loginFailures[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: loginfailures
namespace: keycloak
spec:
clusterName: infinispan
name: loginFailures
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
# end::infinispan-cache-loginFailures[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-offlineClientSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: offlineclientsessions
namespace: keycloak
spec:
clusterName: infinispan
name: offlineClientSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
# end::infinispan-cache-offlineClientSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-offlineSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: offlinesessions
namespace: keycloak
spec:
clusterName: infinispan
name: offlineSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
# end::infinispan-cache-offlineSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-sessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: sessions
namespace: keycloak
spec:
clusterName: infinispan
name: sessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
# end::infinispan-cache-sessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-work[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: work
namespace: keycloak
spec:
clusterName: infinispan
name: work
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
# end::infinispan-cache-work[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc[]
# tag::infinispan-single[]
apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
name: infinispan # <1>
namespace: keycloak
annotations:
infinispan.org/monitoring: 'true' # <2>
spec:
replicas: 3
# end::infinispan-single[]
# end::infinispan-crossdc[]
# This exposes the http endpoint to interact with its caches - more info - https://infinispan.org/docs/stable/titles/rest/rest.html
# We can optionally set the host in the below expose yaml block, otherwise it will be set to a default naming pattern.
expose:
type: Route
configMapName: "cluster-config"
image: quay.io/infinispan/server:14.0.16.Final
configListener:
enabled: false
container:
extraJvmOpts: '-Dorg.infinispan.openssl=false -Dinfinispan.cluster.name=ISPN -Djgroups.xsite.fd.interval=2000 -Djgroups.xsite.fd.timeout=10000'
logging:
categories:
org.infinispan: info
org.jgroups: info
# tag::infinispan-crossdc[]
# tag::infinispan-single[]
security:
endpointSecretName: connect-secret # <3>
service:
type: DataGrid
# end::infinispan-single[]
# end::infinispan-crossdc[]

View file

@ -0,0 +1,384 @@
---
# Source: ispn-helm/templates/infinispan.yaml
# There are several callouts in this YAML marked with `# <1>' etc. See 'running/infinispan-deployment.adoc` for the details.# tag::infinispan-credentials[]
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: connect-secret
namespace: keycloak
data:
identities.yaml: Y3JlZGVudGlhbHM6CiAgLSB1c2VybmFtZTogZGV2ZWxvcGVyCiAgICBwYXNzd29yZDogc3Ryb25nLXBhc3N3b3JkCiAgICByb2xlczoKICAgICAgLSBhZG1pbgo= # <1>
# end::infinispan-credentials[]
---
# Source: ispn-helm/templates/infinispan.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-config
namespace: keycloak
data:
infinispan-config.yaml: >
infinispan:
cacheContainer:
metrics:
namesAsTags: true
gauges: true
histograms: false
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-status[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-status
namespace: keycloak
data:
batch: site status --all-caches --site=site-b
# end::infinispan-crossdc-status[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-disconnect[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-disconnect
namespace: keycloak
data:
batch: site take-offline --all-caches --site=site-b
# end::infinispan-crossdc-disconnect[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-connect[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-connect
namespace: keycloak
data:
batch: site bring-online --all-caches --site=site-b
# end::infinispan-crossdc-connect[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-push-state[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-push-state
namespace: keycloak
data:
batch: site push-site-state --all-caches --site=site-b
# end::infinispan-crossdc-push-state[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-push-state-status[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-push-state-status
namespace: keycloak
data:
batch: site push-site-status --all-caches --site=site-b
# end::infinispan-crossdc-push-state-status[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-reset-push-state-status[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-reset-push-state-status
namespace: keycloak
data:
batch: site clear-push-state-status --all-caches --site=site-b
# end::infinispan-crossdc-reset-push-state-status[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-clear-caches[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-clear-caches
namespace: keycloak
data:
batch: |+
clearcache actionTokens
clearcache authenticationSessions
clearcache clientSessions
clearcache loginFailures
clearcache offlineClientSessions
clearcache offlineSessions
clearcache sessions
clearcache work
# end::infinispan-crossdc-clear-caches[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-actionTokens[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: actiontokens
namespace: keycloak
spec:
clusterName: infinispan
name: actionTokens
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
site-b: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-actionTokens[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-authenticationSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: authenticationsessions
namespace: keycloak
spec:
clusterName: infinispan
name: authenticationSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-b: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-authenticationSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-clientSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: clientsessions
namespace: keycloak
spec:
clusterName: infinispan
name: clientSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-b: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-clientSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-loginFailures[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: loginfailures
namespace: keycloak
spec:
clusterName: infinispan
name: loginFailures
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
site-b: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-loginFailures[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-offlineClientSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: offlineclientsessions
namespace: keycloak
spec:
clusterName: infinispan
name: offlineClientSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-b: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-offlineClientSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-offlineSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: offlinesessions
namespace: keycloak
spec:
clusterName: infinispan
name: offlineSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-b: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-offlineSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-sessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: sessions
namespace: keycloak
spec:
clusterName: infinispan
name: sessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-b: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-sessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-work[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: work
namespace: keycloak
spec:
clusterName: infinispan
name: work
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
site-b: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-work[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc[]
# tag::infinispan-single[]
apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
name: infinispan # <1>
namespace: keycloak
annotations:
infinispan.org/monitoring: 'true' # <2>
spec:
replicas: 3
# end::infinispan-single[]
# end::infinispan-crossdc[]
# This exposes the http endpoint to interact with its caches - more info - https://infinispan.org/docs/stable/titles/rest/rest.html
# We can optionally set the host in the below expose yaml block, otherwise it will be set to a default naming pattern.
expose:
type: Route
configMapName: "cluster-config"
image: quay.io/infinispan/server:14.0.16.Final
configListener:
enabled: false
container:
extraJvmOpts: '-Dorg.infinispan.openssl=false -Dinfinispan.cluster.name=ISPN -Djgroups.xsite.fd.interval=2000 -Djgroups.xsite.fd.timeout=10000'
logging:
categories:
org.infinispan: info
org.jgroups: info
# tag::infinispan-crossdc[]
# tag::infinispan-single[]
security:
endpointSecretName: connect-secret # <3>
service:
type: DataGrid
# end::infinispan-single[]
sites:
local:
name: site-a # <4>
# end::infinispan-crossdc[]
discovery:
launchGossipRouter: true
# tag::infinispan-crossdc[]
expose:
type: Route # <5>
maxRelayNodes: 128
encryption:
transportKeyStore:
secretName: xsite-keystore-secret # <6>
alias: xsite # <7>
filename: keystore.p12 # <8>
routerKeyStore:
secretName: xsite-keystore-secret # <6>
alias: xsite # <7>
filename: keystore.p12 # <8>
trustStore:
secretName: xsite-truststore-secret # <9>
filename: truststore.p12 # <10>
locations:
- name: site-b # <11>
clusterName: infinispan
namespace: keycloak # <12>
url: openshift://api.site-b # <13>
secretName: xsite-token-secret # <14>
# end::infinispan-crossdc[]

View file

@ -0,0 +1,384 @@
---
# Source: ispn-helm/templates/infinispan.yaml
# There are several callouts in this YAML marked with `# <1>' etc. See 'running/infinispan-deployment.adoc` for the details.# tag::infinispan-credentials[]
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: connect-secret
namespace: keycloak
data:
identities.yaml: Y3JlZGVudGlhbHM6CiAgLSB1c2VybmFtZTogZGV2ZWxvcGVyCiAgICBwYXNzd29yZDogc3Ryb25nLXBhc3N3b3JkCiAgICByb2xlczoKICAgICAgLSBhZG1pbgo= # <1>
# end::infinispan-credentials[]
---
# Source: ispn-helm/templates/infinispan.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-config
namespace: keycloak
data:
infinispan-config.yaml: >
infinispan:
cacheContainer:
metrics:
namesAsTags: true
gauges: true
histograms: false
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-status[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-status
namespace: keycloak
data:
batch: site status --all-caches --site=site-a
# end::infinispan-crossdc-status[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-disconnect[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-disconnect
namespace: keycloak
data:
batch: site take-offline --all-caches --site=site-a
# end::infinispan-crossdc-disconnect[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-connect[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-connect
namespace: keycloak
data:
batch: site bring-online --all-caches --site=site-a
# end::infinispan-crossdc-connect[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-push-state[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-push-state
namespace: keycloak
data:
batch: site push-site-state --all-caches --site=site-a
# end::infinispan-crossdc-push-state[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-push-state-status[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-push-state-status
namespace: keycloak
data:
batch: site push-site-status --all-caches --site=site-a
# end::infinispan-crossdc-push-state-status[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-reset-push-state-status[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-reset-push-state-status
namespace: keycloak
data:
batch: site clear-push-state-status --all-caches --site=site-a
# end::infinispan-crossdc-reset-push-state-status[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc-clear-caches[]
apiVersion: v1
kind: ConfigMap
metadata:
name: crossdc-clear-caches
namespace: keycloak
data:
batch: |+
clearcache actionTokens
clearcache authenticationSessions
clearcache clientSessions
clearcache loginFailures
clearcache offlineClientSessions
clearcache offlineSessions
clearcache sessions
clearcache work
# end::infinispan-crossdc-clear-caches[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-actionTokens[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: actiontokens
namespace: keycloak
spec:
clusterName: infinispan
name: actionTokens
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
site-a: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-actionTokens[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-authenticationSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: authenticationsessions
namespace: keycloak
spec:
clusterName: infinispan
name: authenticationSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-a: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-authenticationSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-clientSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: clientsessions
namespace: keycloak
spec:
clusterName: infinispan
name: clientSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-a: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-clientSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-loginFailures[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: loginfailures
namespace: keycloak
spec:
clusterName: infinispan
name: loginFailures
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
site-a: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-loginFailures[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-offlineClientSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: offlineclientsessions
namespace: keycloak
spec:
clusterName: infinispan
name: offlineClientSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-a: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-offlineClientSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-offlineSessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: offlinesessions
namespace: keycloak
spec:
clusterName: infinispan
name: offlineSessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-a: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-offlineSessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-sessions[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: sessions
namespace: keycloak
spec:
clusterName: infinispan
name: sessions
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
mergePolicy: ALWAYS_REMOVE # <1>
site-a: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-sessions[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-cache-work[]
apiVersion: infinispan.org/v2alpha1
kind: Cache
metadata:
name: work
namespace: keycloak
spec:
clusterName: infinispan
name: work
template: |-
distributedCache:
mode: "SYNC"
owners: "2"
statistics: "true"
stateTransfer:
chunkSize: 16
backups:
site-a: # <2>
backup:
strategy: "SYNC" # <3>
stateTransfer:
chunkSize: 16
# end::infinispan-cache-work[]
---
# Source: ispn-helm/templates/infinispan.yaml
# tag::infinispan-crossdc[]
# tag::infinispan-single[]
apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
name: infinispan # <1>
namespace: keycloak
annotations:
infinispan.org/monitoring: 'true' # <2>
spec:
replicas: 3
# end::infinispan-single[]
# end::infinispan-crossdc[]
# This exposes the http endpoint to interact with its caches - more info - https://infinispan.org/docs/stable/titles/rest/rest.html
# We can optionally set the host in the below expose yaml block, otherwise it will be set to a default naming pattern.
expose:
type: Route
configMapName: "cluster-config"
image: quay.io/infinispan/server:14.0.16.Final
configListener:
enabled: false
container:
extraJvmOpts: '-Dorg.infinispan.openssl=false -Dinfinispan.cluster.name=ISPN -Djgroups.xsite.fd.interval=2000 -Djgroups.xsite.fd.timeout=10000'
logging:
categories:
org.infinispan: info
org.jgroups: info
# tag::infinispan-crossdc[]
# tag::infinispan-single[]
security:
endpointSecretName: connect-secret # <3>
service:
type: DataGrid
# end::infinispan-single[]
sites:
local:
name: site-b # <4>
# end::infinispan-crossdc[]
discovery:
launchGossipRouter: true
# tag::infinispan-crossdc[]
expose:
type: Route # <5>
maxRelayNodes: 128
encryption:
transportKeyStore:
secretName: xsite-keystore-secret # <6>
alias: xsite # <7>
filename: keystore.p12 # <8>
routerKeyStore:
secretName: xsite-keystore-secret # <6>
alias: xsite # <7>
filename: keystore.p12 # <8>
trustStore:
secretName: xsite-truststore-secret # <9>
filename: truststore.p12 # <10>
locations:
- name: site-a # <11>
clusterName: infinispan
namespace: keycloak # <12>
url: openshift://api.site-a # <13>
secretName: xsite-token-secret # <14>
# end::infinispan-crossdc[]

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,283 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- end::keycloak-ispn-configmap[] -->
<!--
~ Copyright 2019 Red Hat, Inc. and/or its affiliates
~ and other contributors as indicated by the @author tags.
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<!--tag::keycloak-ispn-configmap[] -->
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:14.0 https://www.infinispan.org/schemas/infinispan-config-14.0.xsd
urn:infinispan:config:store:remote:14.0 https://www.infinispan.org/schemas/infinispan-cachestore-remote-config-14.0.xsd"
xmlns="urn:infinispan:config:14.0">
<!--end::keycloak-ispn-configmap[] -->
<!-- the statistics="true" attribute is not part of the original KC config and was added by Keycloak Benchmark -->
<cache-container name="keycloak" statistics="true">
<transport lock-timeout="60000"/>
<metrics names-as-tags="true" />
<local-cache name="realms" simple-cache="true" statistics="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<local-cache name="users" simple-cache="true" statistics="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<!--tag::keycloak-ispn-remotestore[] -->
<distributed-cache name="sessions" owners="2" statistics="true">
<expiration lifespan="-1"/>
<persistence passivation="false"> <!--1-->
<remote-store xmlns="urn:infinispan:config:store:remote:14.0"
cache="sessions"
raw-values="true"
shared="true"
segmented="false">
<remote-server host="${env.KC_REMOTE_STORE_HOST}"
port="${env.KC_REMOTE_STORE_PORT}"/> <!--2-->
<connection-pool max-active="16"
exhausted-action="CREATE_NEW"/>
<security>
<authentication server-name="infinispan">
<digest username="${env.KC_REMOTE_STORE_USERNAME}"
password="${env.KC_REMOTE_STORE_PASSWORD}"
realm="default"/> <!--3-->
</authentication>
<encryption protocol="TLSv1.3"
sni-hostname="${env.KC_REMOTE_STORE_HOST}">
<truststore filename="/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
type="pem"/> <!--4-->
</encryption>
</security>
</remote-store>
</persistence>
</distributed-cache>
<!--end::keycloak-ispn-remotestore[] -->
<distributed-cache name="authenticationSessions" owners="2" statistics="true">
<expiration lifespan="-1"/>
<persistence passivation="false">
<remote-store xmlns="urn:infinispan:config:store:remote:14.0"
cache="authenticationSessions"
raw-values="true"
shared="true"
segmented="false">
<remote-server host="${env.KC_REMOTE_STORE_HOST}"
port="${env.KC_REMOTE_STORE_PORT}"/>
<connection-pool max-active="16"
exhausted-action="CREATE_NEW"/>
<security>
<authentication server-name="infinispan">
<digest username="${env.KC_REMOTE_STORE_USERNAME}"
password="${env.KC_REMOTE_STORE_PASSWORD}"
realm="default"/>
</authentication>
<encryption protocol="TLSv1.3"
sni-hostname="${env.KC_REMOTE_STORE_HOST}">
<truststore filename="/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
type="pem"/>
</encryption>
</security>
</remote-store>
</persistence>
</distributed-cache>
<distributed-cache name="offlineSessions" owners="2" statistics="true">
<expiration lifespan="-1"/>
<persistence passivation="false">
<remote-store xmlns="urn:infinispan:config:store:remote:14.0"
cache="offlineSessions"
raw-values="true"
shared="true"
segmented="false">
<remote-server host="${env.KC_REMOTE_STORE_HOST}"
port="${env.KC_REMOTE_STORE_PORT}"/>
<connection-pool max-active="16"
exhausted-action="CREATE_NEW"/>
<security>
<authentication server-name="infinispan">
<digest username="${env.KC_REMOTE_STORE_USERNAME}"
password="${env.KC_REMOTE_STORE_PASSWORD}"
realm="default"/>
</authentication>
<encryption protocol="TLSv1.3"
sni-hostname="${env.KC_REMOTE_STORE_HOST}">
<truststore filename="/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
type="pem"/>
</encryption>
</security>
</remote-store>
</persistence>
</distributed-cache>
<distributed-cache name="clientSessions" owners="2" statistics="true">
<expiration lifespan="-1"/>
<persistence passivation="false">
<remote-store xmlns="urn:infinispan:config:store:remote:14.0"
cache="clientSessions"
raw-values="true"
shared="true"
segmented="false">
<remote-server host="${env.KC_REMOTE_STORE_HOST}"
port="${env.KC_REMOTE_STORE_PORT}"/>
<connection-pool max-active="16"
exhausted-action="CREATE_NEW"/>
<security>
<authentication server-name="infinispan">
<digest username="${env.KC_REMOTE_STORE_USERNAME}"
password="${env.KC_REMOTE_STORE_PASSWORD}"
realm="default"/>
</authentication>
<encryption protocol="TLSv1.3" sni-hostname="${env.KC_REMOTE_STORE_HOST}">
<truststore filename="/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
type="pem"/>
</encryption>
</security>
</remote-store>
</persistence>
</distributed-cache>
<distributed-cache name="offlineClientSessions" owners="2" statistics="true">
<expiration lifespan="-1"/>
<persistence passivation="false">
<remote-store xmlns="urn:infinispan:config:store:remote:14.0"
cache="offlineClientSessions"
raw-values="true"
shared="true"
segmented="false">
<remote-server host="${env.KC_REMOTE_STORE_HOST}"
port="${env.KC_REMOTE_STORE_PORT}"/>
<connection-pool max-active="16"
exhausted-action="CREATE_NEW"/>
<security>
<authentication server-name="infinispan">
<digest username="${env.KC_REMOTE_STORE_USERNAME}"
password="${env.KC_REMOTE_STORE_PASSWORD}"
realm="default"/>
</authentication>
<encryption protocol="TLSv1.3" sni-hostname="${env.KC_REMOTE_STORE_HOST}">
<truststore filename="/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
type="pem"/>
</encryption>
</security>
</remote-store>
</persistence>
</distributed-cache>
<distributed-cache name="loginFailures" owners="2" statistics="true">
<expiration lifespan="-1"/>
<persistence passivation="false">
<remote-store xmlns="urn:infinispan:config:store:remote:14.0"
cache="loginFailures"
raw-values="true"
shared="true"
segmented="false">
<remote-server host="${env.KC_REMOTE_STORE_HOST}"
port="${env.KC_REMOTE_STORE_PORT}"/>
<connection-pool max-active="16"
exhausted-action="CREATE_NEW"/>
<security>
<authentication server-name="infinispan">
<digest username="${env.KC_REMOTE_STORE_USERNAME}"
password="${env.KC_REMOTE_STORE_PASSWORD}"
realm="default"/>
</authentication>
<encryption protocol="TLSv1.3" sni-hostname="${env.KC_REMOTE_STORE_HOST}">
<truststore filename="/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
type="pem"/>
</encryption>
</security>
</remote-store>
</persistence>
</distributed-cache>
<local-cache name="authorization" simple-cache="true" statistics="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<!--tag::keycloak-ispn-remotestore-work[] -->
<replicated-cache name="work" statistics="true">
<expiration lifespan="-1"/>
<persistence passivation="false">
<remote-store xmlns="urn:infinispan:config:store:remote:14.0"
cache="work"
raw-values="true"
shared="true"
segmented="false">
<remote-server host="${env.KC_REMOTE_STORE_HOST}"
port="${env.KC_REMOTE_STORE_PORT}"/>
<connection-pool max-active="16"
exhausted-action="CREATE_NEW"/>
<security>
<authentication server-name="infinispan">
<digest username="${env.KC_REMOTE_STORE_USERNAME}"
password="${env.KC_REMOTE_STORE_PASSWORD}"
realm="default"/>
</authentication>
<encryption protocol="TLSv1.3" sni-hostname="${env.KC_REMOTE_STORE_HOST}">
<truststore filename="/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
type="pem"/>
</encryption>
</security>
</remote-store>
</persistence>
</replicated-cache>
<!--end::keycloak-ispn-remotestore-work[] -->
<local-cache name="keys" simple-cache="true" statistics="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<expiration max-idle="3600000"/>
<memory max-count="1000"/>
</local-cache>
<distributed-cache name="actionTokens" owners="2" statistics="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<expiration max-idle="-1" lifespan="-1" interval="300000"/>
<memory max-count="-1"/>
<persistence passivation="false">
<remote-store xmlns="urn:infinispan:config:store:remote:14.0"
cache="actionTokens"
raw-values="true"
shared="true"
segmented="false">
<remote-server host="${env.KC_REMOTE_STORE_HOST}"
port="${env.KC_REMOTE_STORE_PORT}"/>
<connection-pool max-active="16"
exhausted-action="CREATE_NEW"/>
<security>
<authentication server-name="infinispan">
<digest username="${env.KC_REMOTE_STORE_USERNAME}"
password="${env.KC_REMOTE_STORE_PASSWORD}"
realm="default"/>
</authentication>
<encryption protocol="TLSv1.3" sni-hostname="${env.KC_REMOTE_STORE_HOST}">
<truststore filename="/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
type="pem"/>
</encryption>
</security>
</remote-store>
</persistence>
</distributed-cache>
</cache-container>
</infinispan>

View file

@ -0,0 +1,12 @@
= Keycloak High Availability guide
include::../attributes.adoc[]
<#list ctx.guides as guide>
:links_high-availability_${guide.id}_name: ${guide.title}
:links_high-availability_${guide.id}_url: #${guide.id}
</#list>
<#list ctx.guides as guide>
include::${guide.template}[leveloffset=+1]
</#list>

View file

@ -0,0 +1,42 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Multi-site deployments"
summary="Connect multiple {project_name} deployments in different sites to increase the overall availability"
preview="true" >
{project_name} supports deployments that consist of multiple {project_name} instances that connect to each other using its embedded Infinispan; load balancers can distribute the load evenly across those instances.
Those setups are intended for a transparent network on a single site.
The Keycloak high-availability guide goes one step further to describe setups across multiple sites.
While this setup adds additional complexity, that extra amount of high availability may be needed for some enviroments.
The different {sections} introduce the necessary concepts and building blocks.
For each building block, a blueprint shows how to set a fully functional example.
Additional performance tuning and security hardening are still recommended when preparing a production setup.
== Concept and building block overview
* <@links.ha id="concepts-active-passive-sync" />
* <@links.ha id="bblocks-active-passive-sync" />
* <@links.ha id="concepts-database-connections" />
* <@links.ha id="concepts-threads" />
* <@links.ha id="concepts-memory-and-cpu-sizing" />
== Blueprints for building blocks
* <@links.ha id="deploy-aurora-multi-az" />
* <@links.ha id="deploy-keycloak-kubernetes" />
* <@links.ha id="deploy-infinispan-kubernetes-crossdc" />
* <@links.ha id="connect-keycloak-to-external-infinispan" />
* <@links.ha id="deploy-aws-route53-loadbalancer" />
== Operational procedures
* <@links.ha id="operate-failover" />
* <@links.ha id="operate-switch-over" />
* <@links.ha id="operate-network-partition-recovery" />
* <@links.ha id="operate-switch-back" />
</@tmpl.guide>

View file

@ -0,0 +1,32 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Fail over to the secondary site"
summary="This describes the automatic and operational procedures necessary"
preview="true" >
This {section} describes the steps to fail over from primary site to secondary site in a setup as outlined in <@links.ha id="concepts-active-passive-sync" /> together with the blueprints outlined in <@links.ha id="bblocks-active-passive-sync" />.
== When to use procedure
A failover from the primary site to the secondary site will happen automatically based on the checks configured in the loadbalancer.
When the primary site loses its state in {jdgserver_name} or a network partition occurs that prevents the synchronization, manual procedures are necessary to recover the primary site before it can handle traffic again, see the <@links.ha id="operate-switch-back" /> {section}.
To prevent an automatic fallback to the primary site before those manual steps have been performed, configure the loadbalancer as described following to prevent this from happening automatically.
For a graceful switch to the secondary site, follow the instructions in the <@links.ha id="operate-switch-over" /> {section}.
See the <@links.ha id="introduction" /> {section} for different operational procedures.
== Procedure
Follow these steps to manually force a failover.
=== Route53
To force Route53 to mark the primary site as permanently not available and prevent an automatic fallback, edit the health check in AWS to point to a non-existent route (`health/down`).
</@tmpl.guide>

View file

@ -0,0 +1,76 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Recover from an out-of-sync passive site"
summary="This describes the automatic and operational procedures necessary"
preview="true" >
This {section} describes the procedures required to synchronize the secondary site with the primary site in a setup as outlined in <@links.ha id="concepts-active-passive-sync" /> together with the blueprints outlined in <@links.ha id="bblocks-active-passive-sync" />.
include::partials/infinispan/infinispan-attributes.adoc[]
// used by the CLI commands to avoid duplicating the code.
:stale-site: secondary
:keep-site: primary
:keep-site-name: {site-a-cr}
:stale-site-name: {site-b-cr}
== When to use procedure
Use this after a temporary disconnection between sites where {jdgserver_name} was disconnected and the contents of the caches are out-of-sync.
At the end of the procedure, the session contents on the secondary site have been discarded and replaced by the session contents of the primary site.
All caches in the secondary site have been cleared to prevent invalid cached contents.
See the <@links.ha id="introduction" /> {section} for different operational procedures.
== Procedures
=== {jdgserver_name} Cluster
For the context of this {section}, `{site-a}` is the primary site and is active, and `{site-b}` is the secondary site and is passive.
Network partitions may happen between the site and the replication between the {jdgserver_name} cluster will stop.
These procedures bring both sites back in sync.
WARNING: Transferring the full state may impact the {jdgserver_name} cluster performance by increasing the response time and/or resources usage.
The first procedure is to delete the stale data from the secondary site.
. Login into your secondary site.
. Shutdown {project_name}.
This will clear all {project_name} caches, and it prevents the state of {project_name} from being out-of-sync with {jdgserver_name}.
+
When deploying {project_name} using the {project_name} Operator, change the number of {project_name} instances in the {project_name} Custom Resource to 0.
<#include "partials/infinispan/infinispan-cli-connect.adoc" />
include::partials/infinispan/infinispan-cli-clear-caches.adoc[]
Now we are ready to transfer the state from the primary site to the secondary site.
. Login into your primary site
<#include "partials/infinispan/infinispan-cli-connect.adoc" />
include::partials/infinispan/infinispan-cli-state-transfer.adoc[]
As now the state is available in the secondary site, {project_name} can be started again:
. Login into your secondary site.
. Startup {project_name}.
+
When deploying {project_name} using the {project_name} Operator, change the number of {project_name} instances in the {project_name} Custom Resource to the original value.
=== AWS Aurora Database
No action required.
=== Route53
No action required.
</@tmpl.guide>

View file

@ -0,0 +1,81 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Switch back to the primary site"
summary="This describes the operational procedures necessary"
preview="true" >
These procedures switch back to the primary site back after a failover or switchover to the secondary site.
In a setup as outlined in <@links.ha id="concepts-active-passive-sync" /> together with the blueprints outlined in <@links.ha id="bblocks-active-passive-sync" />.
include::partials/infinispan/infinispan-attributes.adoc[]
// used by the CLI commands to avoid duplicating the code.
:stale-site: primary
:keep-site: secondary
:keep-site-name: {site-b-cr}
:stale-site-name: {site-a-cr}
== When to use this procedure
These procedures bring the primary site back to operation when the secondary site is handling all the traffic.
At the end of the {section}, the primary site is online again and handles the traffic.
This procedure is necessary when the primary site has lost its state in {jdgserver_name}, a network partition occurred between the primary and the secondary site while the secondary site was active, or the replication was disabled as described in the <@links.ha id="operate-switch-over"/> {section}.
If the data in {jdgserver_name} on both sites is still in sync, the procedure for {jdgserver_name} can be skipped.
See the <@links.ha id="introduction" /> {section} for different operational procedures.
== Procedures
=== {jdgserver_name} Cluster
For the context of this {section}, `{site-a}` is the primary site, recovering back to operation, and `{site-b}` is the secondary site, running in production.
After the {jdgserver_name} in the primary site is back online and has joined the cross-site channel (see <@links.ha id="deploy-infinispan-kubernetes-crossdc" />#verifying-the-deployment on how to verify the {jdgserver_name} deployment), the state transfer must be manually started from the secondary site.
After clearing the state in the primary site, it transfers the full state from the secondary site to the primary site, and it must be completed before the primary site can start handling incoming requests.
WARNING: Transferring the full state may impact the {jdgserver_name} cluster perform by increasing the response time and/or resources usage.
The first procedure is to delete any stale data from the primary site.
. Log in to the primary site.
. Shutdown {project_name}.
This action will clear all {project_name} caches and prevents the state of {project_name} from being out-of-sync with {jdgserver_name}.
+
When deploying {project_name} using the {project_name} Operator, change the number of {project_name} instances in the {project_name} Custom Resource to 0.
<#include "partials/infinispan/infinispan-cli-connect.adoc" />
include::partials/infinispan/infinispan-cli-clear-caches.adoc[]
Now we are ready to transfer the state from the secondary site to the primary site.
. Log in into your secondary site.
<#include "partials/infinispan/infinispan-cli-connect.adoc" />
include::partials/infinispan/infinispan-cli-state-transfer.adoc[]
. Log in to the primary site.
. Start {project_name}.
+
When deploying {project_name} using the {project_name} Operator, change the number of {project_name} instances in the {project_name} Custom Resource to the original value.
Both {jdgserver_name} clusters are in sync and the switchover from secondary back to the primary site can be performed.
=== AWS Aurora Database
include::partials/aurora/aurora-failover.adoc[]
=== Route53
If switching over to the secondary site has been triggered by changing the health endpoint, edit the health check in AWS to point to a correct endpoint (`health/live`).
After some minutes, the clients will notice the change and traffic will gradually move over to the secondary site.
</@tmpl.guide>

View file

@ -0,0 +1,91 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/links.adoc" as links>
<@tmpl.guide
title="Switch over to the secondary site"
summary="This topic describes the operational procedures necessary"
preview="true" >
This procedure switches from the primary site to the secondary site when using a setup as outlined in <@links.ha id="concepts-active-passive-sync" /> together with the blueprints outlined in <@links.ha id="bblocks-active-passive-sync" />.
include::partials/infinispan/infinispan-attributes.adoc[]
== When to use this procedure
Use this procedure to gracefully take the primary offline.
Once the primary site is back online, use the {sections} <@links.ha id="operate-network-partition-recovery" /> and <@links.ha id="operate-switch-back" /> to return to the original state with the primary site being active.
See the <@links.ha id="introduction" /> {section} for different operational procedures.
== Procedures
=== {jdgserver_name} Cluster
For the context of this {section}, `{site-a}` is the primary site and `{site-b}` is the secondary site.
When you are ready to take a site offline, a good practice is to disable the replication towards it.
This action prevents errors or delays when the channels are disconnected between the primary and the secondary site.
==== Procedures to transfer state from secondary to primary site
. Log in into your secondary site
<#include "partials/infinispan/infinispan-cli-connect.adoc" />
. Disable the replication to the primary site by running the following command:
+
.Command:
[source,bash,subs="+attributes"]
----
site take-offline --all-caches --site={site-a-cr}
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"offlineClientSessions" : "ok",
"authenticationSessions" : "ok",
"sessions" : "ok",
"clientSessions" : "ok",
"work" : "ok",
"offlineSessions" : "ok",
"loginFailures" : "ok",
"actionTokens" : "ok"
}
----
. Check the replication status is `offline`.
+
.Command:
[source,bash,subs="+attributes"]
----
site status --all-caches --site={site-a-cr}
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"status" : "offline"
}
----
+
If the status is not `offline`, repeat the previous step.
The {jdgserver_name} cluster in the secondary site is ready to handle requests without trying to replicate to the primary site.
=== AWS Aurora Database
include::partials/aurora/aurora-failover.adoc[]
=== {project_name} Cluster
No action required.
=== Route53
To force Route53 to mark the primary site as not available, edit the health check in AWS to point to a non-existent route (`health/down`). After some minutes, the clients will notice the change and traffic will gradually move over to the secondary site.
</@tmpl.guide>

View file

@ -0,0 +1,216 @@
. Retrieve the Aurora VPC
+
.Command:
[source,bash]
----
aws ec2 describe-vpcs \
--filters "Name=tag:AuroraCluster,Values=keycloak-aurora" \
--query 'Vpcs[*].VpcId' \
--region eu-west-1 \
--output text
----
+
.Output:
[source]
----
vpc-0b40bd7c59dbe4277
----
+
. Retrieve the ROSA cluster VPC
.. Login to the ROSA cluster using `oc`
.. Retrieve the ROSA VPC
+
.Command:
[source,bash]
----
<#noparse>
NODE=$(kubectl get nodes --selector=node-role.kubernetes.io/worker -o jsonpath='{.items[0].metadata.name}')
aws ec2 describe-instances \
--filters "Name=private-dns-name,Values=${NODE}" \
--query 'Reservations[0].Instances[0].VpcId' \
--region eu-west-1 \
--output text
</#noparse>
----
+
.Output:
[source]
----
vpc-0b721449398429559
----
+
. Create Peering Connection
+
.Command:
[source,bash]
----
aws ec2 create-vpc-peering-connection \
--vpc-id vpc-0b721449398429559 \# <1>
--peer-vpc-id vpc-0b40bd7c59dbe4277 \# <2>
--peer-region eu-west-1 \
--region eu-west-1
----
<1> ROSA cluster VPC
<2> Aurora VPC
+
.Output:
[source,json]
----
{
"VpcPeeringConnection": {
"AccepterVpcInfo": {
"OwnerId": "606671647913",
"VpcId": "vpc-0b40bd7c59dbe4277",
"Region": "eu-west-1"
},
"ExpirationTime": "2023-11-08T13:26:30+00:00",
"RequesterVpcInfo": {
"CidrBlock": "10.0.17.0/24",
"CidrBlockSet": [
{
"CidrBlock": "10.0.17.0/24"
}
],
"OwnerId": "606671647913",
"PeeringOptions": {
"AllowDnsResolutionFromRemoteVpc": false,
"AllowEgressFromLocalClassicLinkToRemoteVpc": false,
"AllowEgressFromLocalVpcToRemoteClassicLink": false
},
"VpcId": "vpc-0b721449398429559",
"Region": "eu-west-1"
},
"Status": {
"Code": "initiating-request",
"Message": "Initiating Request to 606671647913"
},
"Tags": [],
"VpcPeeringConnectionId": "pcx-0cb23d66dea3dca9f"
}
}
----
+
. Wait for Peering connection to exist
+
.Command:
[source,bash]
----
aws ec2 wait vpc-peering-connection-exists --vpc-peering-connection-ids pcx-0cb23d66dea3dca9f
----
+
. Accept the peering connection
+
.Command:
[source,bash]
----
aws ec2 accept-vpc-peering-connection \
--vpc-peering-connection-id pcx-0cb23d66dea3dca9f \
--region eu-west-1
----
+
.Output:
[source,json]
----
{
"VpcPeeringConnection": {
"AccepterVpcInfo": {
"CidrBlock": "192.168.0.0/16",
"CidrBlockSet": [
{
"CidrBlock": "192.168.0.0/16"
}
],
"OwnerId": "606671647913",
"PeeringOptions": {
"AllowDnsResolutionFromRemoteVpc": false,
"AllowEgressFromLocalClassicLinkToRemoteVpc": false,
"AllowEgressFromLocalVpcToRemoteClassicLink": false
},
"VpcId": "vpc-0b40bd7c59dbe4277",
"Region": "eu-west-1"
},
"RequesterVpcInfo": {
"CidrBlock": "10.0.17.0/24",
"CidrBlockSet": [
{
"CidrBlock": "10.0.17.0/24"
}
],
"OwnerId": "606671647913",
"PeeringOptions": {
"AllowDnsResolutionFromRemoteVpc": false,
"AllowEgressFromLocalClassicLinkToRemoteVpc": false,
"AllowEgressFromLocalVpcToRemoteClassicLink": false
},
"VpcId": "vpc-0b721449398429559",
"Region": "eu-west-1"
},
"Status": {
"Code": "provisioning",
"Message": "Provisioning"
},
"Tags": [],
"VpcPeeringConnectionId": "pcx-0cb23d66dea3dca9f"
}
}
----
+
. Update ROSA cluster VPC route-table
+
.Command:
[source,bash]
----
ROSA_PUBLIC_ROUTE_TABLE_ID=$(aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=vpc-0b721449398429559" "Name=association.main,Values=true" \# <1>
--query "RouteTables[*].RouteTableId" \
--output text \
--region eu-west-1
)
aws ec2 create-route \
--route-table-id ${ROSA_PUBLIC_ROUTE_TABLE_ID} \
--destination-cidr-block 192.168.0.0/16 \# <2>
--vpc-peering-connection-id pcx-0cb23d66dea3dca9f \
--region eu-west-1
----
<1> ROSA cluster VPC
<2> This must be the same as the cidr-block used when creating the Aurora VPC
+
. Update the Aurora Security Group
+
.Command:
[source,bash]
----
AURORA_SECURITY_GROUP_ID=$(aws ec2 describe-security-groups \
--filters "Name=group-name,Values=keycloak-aurora-security-group" \
--query "SecurityGroups[*].GroupId" \
--region eu-west-1 \
--output text
)
aws ec2 authorize-security-group-ingress \
--group-id ${AURORA_SECURITY_GROUP_ID} \
--protocol tcp \
--port 5432 \
--cidr 10.0.17.0/24 \# <1>
--region eu-west-1
----
<1> The "machine_cidr" of the ROSA cluster
+
.Output:
[source,json]
----
{
"Return": true,
"SecurityGroupRules": [
{
"SecurityGroupRuleId": "sgr-0785d2f04b9cec3f5",
"GroupId": "sg-0d746cc8ad8d2e63b",
"GroupOwnerId": "606671647913",
"IsEgress": false,
"IpProtocol": "tcp",
"FromPort": 5432,
"ToPort": 5432,
"CidrIpv4": "10.0.17.0/24"
}
]
}
----

View file

@ -0,0 +1,15 @@
Assuming a Regional multi-AZ Aurora deployment, the current writer instance should be in the same region as the active {project_name} cluster to avoid latencies and communication across availability zones.
Switching the writer instance of Aurora will lead to a short downtime. The writer instance in the other site with a slightly longer latency might be acceptable for some deployments.
Therefore, this situation might be deferred to a maintenance window or skipped depending on the circumstances of the deployment.
To change the writer instance, run a failover.
This change will make the database unavailable for a short time.
{project_name} will need to re-establish database connections.
To fail over the writer instance to the other AZ, issue the following command:
[source,bash]
----
aws rds failover-db-cluster --db-cluster-identifier ...
----

View file

@ -0,0 +1,356 @@
. Create a VPC for the Aurora cluster
+
.Command:
[source,bash]
----
aws ec2 create-vpc \
--cidr-block 192.168.0.0/16 \
--tag-specifications "ResourceType=vpc, Tags=[{Key=AuroraCluster,Value=keycloak-aurora}]" \# <1>
--region eu-west-1
----
<1> We add an optional tag with the name of the Aurora cluster so that we can easily retrieve the VPC.
+
.Output:
[source,json]
----
{
"Vpc": {
"CidrBlock": "192.168.0.0/16",
"DhcpOptionsId": "dopt-0bae7798158bc344f",
"State": "pending",
"VpcId": "vpc-0b40bd7c59dbe4277",
"OwnerId": "606671647913",
"InstanceTenancy": "default",
"Ipv6CidrBlockAssociationSet": [],
"CidrBlockAssociationSet": [
{
"AssociationId": "vpc-cidr-assoc-09a02a83059ba5ab6",
"CidrBlock": "192.168.0.0/16",
"CidrBlockState": {
"State": "associated"
}
}
],
"IsDefault": false
}
}
----
+
. Create a subnet for each availability zone that Aurora will be deployed to, using the `VpcId` of the newly created VPC.
+
NOTE: The cidr-block range specified for each of the availability zones must not overlap.
+
.. Zone A
+
.Command:
[source,bash]
----
aws ec2 create-subnet \
--availability-zone "eu-west-1a" \
--vpc-id vpc-0b40bd7c59dbe4277 \
--cidr-block 192.168.0.0/19 \
--region eu-west-1
----
+
.Output:
[source,json]
----
{
"Subnet": {
"AvailabilityZone": "eu-west-1a",
"AvailabilityZoneId": "euw1-az3",
"AvailableIpAddressCount": 8187,
"CidrBlock": "192.168.0.0/19",
"DefaultForAz": false,
"MapPublicIpOnLaunch": false,
"State": "available",
"SubnetId": "subnet-0d491a1a798aa878d",
"VpcId": "vpc-0b40bd7c59dbe4277",
"OwnerId": "606671647913",
"AssignIpv6AddressOnCreation": false,
"Ipv6CidrBlockAssociationSet": [],
"SubnetArn": "arn:aws:ec2:eu-west-1:606671647913:subnet/subnet-0d491a1a798aa878d",
"EnableDns64": false,
"Ipv6Native": false,
"PrivateDnsNameOptionsOnLaunch": {
"HostnameType": "ip-name",
"EnableResourceNameDnsARecord": false,
"EnableResourceNameDnsAAAARecord": false
}
}
}
----
.. Zone B
+
.Command:
[source,bash]
----
aws ec2 create-subnet \
--availability-zone "eu-west-1b" \
--vpc-id vpc-0b40bd7c59dbe4277 \
--cidr-block 192.168.32.0/19 \
--region eu-west-1
----
+
.Output:
[source,json]
----
{
"Subnet": {
"AvailabilityZone": "eu-west-1b",
"AvailabilityZoneId": "euw1-az1",
"AvailableIpAddressCount": 8187,
"CidrBlock": "192.168.32.0/19",
"DefaultForAz": false,
"MapPublicIpOnLaunch": false,
"State": "available",
"SubnetId": "subnet-057181b1e3728530e",
"VpcId": "vpc-0b40bd7c59dbe4277",
"OwnerId": "606671647913",
"AssignIpv6AddressOnCreation": false,
"Ipv6CidrBlockAssociationSet": [],
"SubnetArn": "arn:aws:ec2:eu-west-1:606671647913:subnet/subnet-057181b1e3728530e",
"EnableDns64": false,
"Ipv6Native": false,
"PrivateDnsNameOptionsOnLaunch": {
"HostnameType": "ip-name",
"EnableResourceNameDnsARecord": false,
"EnableResourceNameDnsAAAARecord": false
}
}
}
----
+
. Obtain the ID of the Aurora VPC route-table
+
.Command:
[source,bash]
----
aws ec2 describe-route-tables \
--filters Name=vpc-id,Values=vpc-0b40bd7c59dbe4277 \
--region eu-west-1
----
+
.Output:
[source,json]
----
{
"RouteTables": [
{
"Associations": [
{
"Main": true,
"RouteTableAssociationId": "rtbassoc-02dfa06f4c7b4f99a",
"RouteTableId": "rtb-04a644ad3cd7de351",
"AssociationState": {
"State": "associated"
}
}
],
"PropagatingVgws": [],
"RouteTableId": "rtb-04a644ad3cd7de351",
"Routes": [
{
"DestinationCidrBlock": "192.168.0.0/16",
"GatewayId": "local",
"Origin": "CreateRouteTable",
"State": "active"
}
],
"Tags": [],
"VpcId": "vpc-0b40bd7c59dbe4277",
"OwnerId": "606671647913"
}
]
}
----
+
. Associate the Aurora VPC route-table each availability zone's subnet
.. Zone A
+
.Command:
[source,bash]
----
aws ec2 associate-route-table \
--route-table-id rtb-04a644ad3cd7de351 \
--subnet-id subnet-0d491a1a798aa878d \
--region eu-west-1
----
+
.. Zone B
+
.Command:
[source,bash]
----
aws ec2 associate-route-table \
--route-table-id rtb-04a644ad3cd7de351 \
--subnet-id subnet-057181b1e3728530e \
--region eu-west-1
----
+
. Create Aurora Subnet Group
+
.Command:
[source,bash]
----
aws rds create-db-subnet-group \
--db-subnet-group-name keycloak-aurora-subnet-group \
--db-subnet-group-description "Aurora DB Subnet Group" \
--subnet-ids subnet-0d491a1a798aa878d subnet-057181b1e3728530e \
--region eu-west-1
----
+
. Create Aurora Security Group
+
.Command:
[source,bash]
----
aws ec2 create-security-group \
--group-name keycloak-aurora-security-group \
--description "Aurora DB Security Group" \
--vpc-id vpc-0b40bd7c59dbe4277 \
--region eu-west-1
----
+
.Output:
[source,json]
----
{
"GroupId": "sg-0d746cc8ad8d2e63b"
}
----
+
. Create the Aurora DB Cluster
+
.Command:
[source,bash]
----
aws rds create-db-cluster \
--db-cluster-identifier keycloak-aurora \
--database-name keycloak \
--engine aurora-postgresql \
--engine-version 15.3 \
--master-username keycloak \
--master-user-password secret99 \
--vpc-security-group-ids sg-0d746cc8ad8d2e63b \
--db-subnet-group-name keycloak-aurora-subnet-group \
--region eu-west-1
----
+
NOTE: You should replace the `--master-username` and `--master-user-password` values.
The values specified here must be used when configuring the Keycloak DB credentials.
+
.Output:
[source,json]
----
{
"DBCluster": {
"AllocatedStorage": 1,
"AvailabilityZones": [
"eu-west-1b",
"eu-west-1c",
"eu-west-1a"
],
"BackupRetentionPeriod": 1,
"DatabaseName": "keycloak",
"DBClusterIdentifier": "keycloak-aurora",
"DBClusterParameterGroup": "default.aurora-postgresql15",
"DBSubnetGroup": "keycloak-aurora-subnet-group",
"Status": "creating",
"Endpoint": "keycloak-aurora.cluster-clhthfqe0h8p.eu-west-1.rds.amazonaws.com",
"ReaderEndpoint": "keycloak-aurora.cluster-ro-clhthfqe0h8p.eu-west-1.rds.amazonaws.com",
"MultiAZ": false,
"Engine": "aurora-postgresql",
"EngineVersion": "15.3",
"Port": 5432,
"MasterUsername": "keycloak",
"PreferredBackupWindow": "02:21-02:51",
"PreferredMaintenanceWindow": "fri:03:34-fri:04:04",
"ReadReplicaIdentifiers": [],
"DBClusterMembers": [],
"VpcSecurityGroups": [
{
"VpcSecurityGroupId": "sg-0d746cc8ad8d2e63b",
"Status": "active"
}
],
"HostedZoneId": "Z29XKXDKYMONMX",
"StorageEncrypted": false,
"DbClusterResourceId": "cluster-IBWXUWQYM3MS5BH557ZJ6ZQU4I",
"DBClusterArn": "arn:aws:rds:eu-west-1:606671647913:cluster:keycloak-aurora",
"AssociatedRoles": [],
"IAMDatabaseAuthenticationEnabled": false,
"ClusterCreateTime": "2023-11-01T10:40:45.964000+00:00",
"EngineMode": "provisioned",
"DeletionProtection": false,
"HttpEndpointEnabled": false,
"CopyTagsToSnapshot": false,
"CrossAccountClone": false,
"DomainMemberships": [],
"TagList": [],
"AutoMinorVersionUpgrade": true,
"NetworkType": "IPV4"
}
}
----
+
. Create Aurora DB instances
+
.. Create Zone A Writer instance
+
.Command:
[source,bash]
----
aws rds create-db-instance \
--db-cluster-identifier keycloak-aurora \
--db-instance-identifier "keycloak-aurora-instance-1" \
--db-instance-class db.t4g.large \
--engine aurora-postgresql \
--region eu-west-1
----
+
.. Create Zone B Reader instance
+
.Command:
[source,bash]
----
aws rds create-db-instance \
--db-cluster-identifier keycloak-aurora \
--db-instance-identifier "keycloak-aurora-instance-2" \
--db-instance-class db.t4g.large \
--engine aurora-postgresql \
--region eu-west-1
----
+
. Wait for all Writer and Reader instances to be ready
+
.Command:
[source,bash]
----
aws rds wait db-instance-available --db-instance-identifier keycloak-aurora-instance-1 --region eu-west-1
aws rds wait db-instance-available --db-instance-identifier keycloak-aurora-instance-2 --region eu-west-1
----
+
. [[aurora-writer-url]]Obtain the Writer endpoint URL for use by Keycloak
+
.Command:
[source,bash]
----
aws rds describe-db-clusters \
--db-cluster-identifier keycloak-aurora \
--query 'DBClusters[*].Endpoint' \
--region eu-west-1 \
--output text
----
+
.Output:
[source,json]
----
[
"keycloak-aurora.cluster-clhthfqe0h8p.eu-west-1.rds.amazonaws.com"
]
----

View file

@ -0,0 +1,23 @@
The simplest way to verify that a connection is possible between a ROSA cluster and an Aurora DB cluster is to deploy
`psql` on the Openshift cluster and attempt to connect to the writer endpoint.
The following command creates a pod in the default namespace and establishes a `psql` connection with the Aurora cluster if possible.
Upon exiting the pod shell, the pod is deleted.
[source,bash]
----
USER=keycloak # <1>
PASSWORD=secret99 # <2>
DATABASE=keycloak # <3>
HOST=$(aws rds describe-db-clusters \
--db-cluster-identifier keycloak-aurora \ # <4>
--query 'DBClusters[*].Endpoint' \
--region eu-west-1 \
--output text
)
kubectl run -i --tty --rm debug --image=postgres:13 --restart=Never -- psql postgresql://${USER}:${PASSWORD}@${HOST}/${DATABASE}
----
<1> Aurora DB user, this can be the same as `--master-username` used when creating the DB.
<2> Aurora DB user-password, this can be the same as `--master--user-password` used when creating the DB.
<3> The name of the Aurora DB, such as `--database-name`.
<4> The name of your Aurora DB cluster.

View file

@ -0,0 +1,2 @@
NOTE: We provide these blueprints to show a minimal functionally complete example with a good baseline performance for regular installations.
You would still need to adapt it to your environment and your organization's standards and security best practices.

View file

@ -0,0 +1,7 @@
For the best performance, the values for the initial, minimal and maximum database connection pool size should all be equal.
This avoids creating new database connections when a new request comes in which is costly.
Keeping the database connection open for as long as possible allows for server side statement caching bound to a connection.
In the case of PostgreSQL, to use a server-side prepared statement, https://jdbc.postgresql.org/documentation/server-prepare/#activation[a query needs to be executed (by default) at least five times].
See the https://www.postgresql.org/docs/current/sql-prepare.html[PostgreSQL docs on prepared statements] for more information.

View file

@ -0,0 +1,29 @@
// Attributes present in doc/kubernetes/collector/build.sh
// If the build.sh is changed, update the attributes in this file
// namespace
:ns: keycloak
// sites: crossdc.local.name and crossdc.remote.name
:site-a-cr: site-a
:site-b-cr: site-b
// crossdc.remote.secret
:sa-secret: xsite-token-secret
// crossdc.route.tls.keystore.secret
:ks-secret: xsite-keystore-secret
// crossdc.route.tls.truststore.secret
:ts-secret: xsite-truststore-secret
// hotrodPassword
:hr-password: strong-password
// cross-site service account
:sa: xsite-sa
// deployment name (hardcoded in ispn-helm chart)
:cluster-name: infinispan
// Other common attributes
:operator-docs: https://infinispan.org/docs/infinispan-operator/main/operator.html
:xsite-docs: https://infinispan.org/docs/stable/titles/xsite/xsite.html
:ocp: OpenShift
:ispn: Infinispan
:ispn-operator: Infinispan Operator
:kc: Keycloak
:site-a: Site-A
:site-b: Site-B

View file

@ -0,0 +1,99 @@
. Disable the replication from {stale-site} site to the {keep-site} site by running the following command.
It prevents the clear request to reach the {keep-site} site and delete all the correct cached data.
+
.Command:
[source,bash,subs="+attributes"]
----
site take-offline --all-caches --site={keep-site-name}
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"offlineClientSessions" : "ok",
"authenticationSessions" : "ok",
"sessions" : "ok",
"clientSessions" : "ok",
"work" : "ok",
"offlineSessions" : "ok",
"loginFailures" : "ok",
"actionTokens" : "ok"
}
----
. Check the replication status is `offline`.
+
.Command:
[source,bash,subs="+attributes"]
----
site status --all-caches --site={keep-site-name}
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"status" : "offline"
}
----
+
If the status is not `offline`, repeat the previous step.
+
WARNING: Make sure the replication is `offline` otherwise the clear data will clear both sites.
. Clear all the cached data in {stale-site} site using the following commands:
+
.Command:
[source,bash,subs="+attributes"]
----
clearcache actionTokens
clearcache authenticationSessions
clearcache clientSessions
clearcache loginFailures
clearcache offlineClientSessions
clearcache offlineSessions
clearcache sessions
clearcache work
----
+
These commands do not print any output.
. Re-enable the cross-site replication from {stale-site} site to the {keep-site} site.
+
.Command:
[source,bash,subs="+attributes"]
----
site bring-online --all-caches --site={keep-site-name}
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"offlineClientSessions" : "ok",
"authenticationSessions" : "ok",
"sessions" : "ok",
"clientSessions" : "ok",
"work" : "ok",
"offlineSessions" : "ok",
"loginFailures" : "ok",
"actionTokens" : "ok"
}
----
. Check the replication status is `online`.
+
.Command:
[source,bash,subs="+attributes"]
----
site status --all-caches --site={keep-site-name}
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"status" : "online"
}
----

View file

@ -0,0 +1,21 @@
. Connect into {jdgserver_name} Cluster using the {jdgserver_name} CLI tool:
+
.Command:
[source,bash,subs="+attributes"]
----
kubectl -n {ns} exec -it pods/{cluster-name}-0 -- ./bin/cli.sh --trustall --connect https://127.0.0.1:11222
----
+
It asks for the username and password for the {jdgserver_name} cluster.
Those credentials are the one set in the <@links.ha id="deploy-infinispan-kubernetes-crossdc"/> {section} in the configuring credentials section.
+
.Output:
[source,bash,subs="+attributes"]
----
Username: developer
Password:
[{cluster-name}-0-29897@ISPN//containers/default]>
----
+
NOTE: The pod name depends on the cluster name defined in the {jdgserver_name} CR.
The connection can be done with any pod in the {jdgserver_name} cluster.

View file

@ -0,0 +1,120 @@
. Trigger the state transfer from the {keep-site} site to the {stale-site} site.
+
.Command:
[source,bash,subs="+attributes"]
----
site push-site-state --all-caches --site={stale-site-name}
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"offlineClientSessions" : "ok",
"authenticationSessions" : "ok",
"sessions" : "ok",
"clientSessions" : "ok",
"work" : "ok",
"offlineSessions" : "ok",
"loginFailures" : "ok",
"actionTokens" : "ok"
}
----
. Check the replication status is `online` for all caches.
+
.Command:
[source,bash,subs="+attributes"]
----
site status --all-caches --site={stale-site-name}
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"status" : "online"
}
----
. Wait for the state transfer to complete by checking the output of `push-site-status` command for all caches.
+
.Command:
[source,bash,subs="+attributes"]
----
site push-site-status --cache=actionTokens
site push-site-status --cache=authenticationSessions
site push-site-status --cache=clientSessions
site push-site-status --cache=loginFailures
site push-site-status --cache=offlineClientSessions
site push-site-status --cache=offlineSessions
site push-site-status --cache=sessions
site push-site-status --cache=work
----
+
.Output:
[source,bash,subs="+attributes"]
----
{
"{stale-site-name}" : "OK"
}
{
"{stale-site-name}" : "OK"
}
{
"{stale-site-name}" : "OK"
}
{
"{stale-site-name}" : "OK"
}
{
"{stale-site-name}" : "OK"
}
{
"{stale-site-name}" : "OK"
}
{
"{stale-site-name}" : "OK"
}
{
"{stale-site-name}" : "OK"
}
----
+
Check the table in {xsite-docs}#rest_v2_xsite_state_push_cross-site-operations-rest[this section for the Cross-Site Documentation] for the possible status values.
+
If an error is reported, repeat the state transfer for that specific cache.
+
.Command:
[source,bash,subs="+attributes"]
----
site push-site-state --cache=<cache-name> --site={stale-site-name}
----
. Clear/reset the state transfer status with the following command
+
.Command:
[source,bash,subs="+attributes"]
----
site clear-push-site-status --cache=actionTokens
site clear-push-site-status --cache=authenticationSessions
site clear-push-site-status --cache=clientSessions
site clear-push-site-status --cache=loginFailures
site clear-push-site-status --cache=offlineClientSessions
site clear-push-site-status --cache=offlineSessions
site clear-push-site-status --cache=sessions
site clear-push-site-status --cache=work
----
+
.Output:
[source,bash,subs="+attributes"]
----
"ok"
"ok"
"ok"
"ok"
"ok"
"ok"
"ok"
"ok"
----

View file

@ -0,0 +1,35 @@
[[infinispan-credentials]]
. Configure the credential to access the Infinispan cluster.
+
Keycloak needs this credential to be able to authenticate with the Infinispan cluster.
The following `identities.yaml` file sets the username and password with admin permissions
+
[source,yam,subs="+attributes"]
----
credentials:
- username: developer
password: {hr-password}
roles:
- admin
----
+
The `identities.yaml` could be set in a secret as one of the following:
* As a Kubernetes Resource:
+
.Credential Secret
[.wrap]
[source,yaml]
----
include::../../examples/generated/ispn-single.yaml[tag=infinispan-credentials]
----
<1> The `identities.yaml` from the previous example base64 encoded.
+
* Using the CLI
+
[source,bash]
----
kubectl create secret generic connect-secret --from-file=identities.yaml
----
+
Check https://infinispan.org/docs/infinispan-operator/main/operator.html#configuring-authentication[Configuring Authentication] documentation for more details.

View file

@ -0,0 +1 @@
. Install the https://infinispan.org/docs/infinispan-operator/main/operator.html#installation[Infinispan Operator]

View file

@ -0,0 +1,2 @@
* OpenShift or Kubernetes cluster running
* Understanding of the https://infinispan.org/docs/infinispan-operator/main/operator.html[Infinispan Operator]

View file

@ -0,0 +1,3 @@
The number of JGroup threads is `200` by default, and can be configured using the property Java system property `jgroups.thread_pool.max_threads`.
As shown in experiments, assuming a Keycloak cluster with 4 Pods, each Pod shouldn't have more than 50 worker threads so that it doesn't run out of threads in the JGroup thread pool of 200.
Use the Quarkus configuration options `quarkus.thread-pool.max-threads` to configure the maximum number of worker threads.

View file

@ -0,0 +1,12 @@
introduction
concepts-active-passive-sync
bblocks-active-passive-sync
deploy-aurora-multi-az
deploy-keycloak-kubernetes
deploy-infinispan-kubernetes-crossdc
connect-keycloak-to-external-infinispan
deploy-aws-route53-loadbalancer
operate-failover
operate-switch-over
operate-network-partition-recovery
operate-switch-back

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 17 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 7.9 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 13 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 24 KiB

View file

@ -81,6 +81,25 @@
</resources>
</configuration>
</execution>
<execution>
<id>copy-included-files</id>
<phase>validate</phase>
<goals>
<goal>copy-resources</goal>
</goals>
<configuration>
<outputDirectory>${basedir}/target/generated-guides/</outputDirectory>
<resources>
<resource>
<directory>${basedir}/</directory>
<includes>
<include>**/examples/**/*.*</include>
<include>**/partials/**/*.*</include>
</includes>
</resource>
</resources>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
@ -115,8 +134,6 @@
<idseparator>-</idseparator>
<docinfo1>true</docinfo1>
<imagesdir>../images</imagesdir>
<section>guide</section>
<sections>guides</sections>
<attribute-missing>warn</attribute-missing>
</attributes>
<logHandler>
@ -170,6 +187,17 @@
<outputDirectory>${project.build.directory}/generated-docs/getting-started</outputDirectory>
</configuration>
</execution>
<execution>
<id>high-availability-asciidoc-to-html</id>
<phase>generate-resources</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<sourceDirectory>${basedir}/target/generated-guides/high-availability</sourceDirectory>
<outputDirectory>${project.build.directory}/generated-docs/high-availability</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<plugin>

View file

@ -1,10 +1,11 @@
<#import "/templates/options.adoc" as opts>
<#macro guide title summary priority=999 includedOptions="">
<#macro guide title summary priority=999 includedOptions="" preview="" tileVisible="true">
:guide-id: ${id}
:guide-title: ${title}
:guide-summary: ${summary}
:guide-priority: ${priority}
:guide-tile-visible: ${tileVisible}
:version: ${version}
include::../attributes.adoc[]
@ -12,6 +13,10 @@ include::../attributes.adoc[]
[[${id}]]
= ${title}
ifeval::["${preview}" == "true"]
WARNING: This {section} is describing a feature which is currently in preview. Please provide your feedback while were continuing to work on this.
endif::[]
<#nested>
<#if includedOptions?has_content>

View file

@ -1 +1,3 @@
<#macro server id>link:{links_server_${id}_url}[{links_server_${id}_name}]</#macro>
<#macro server id>link:{links_server_${id}_url}[{links_server_${id}_name}]</#macro>
<#macro operator id>link:{links_operator_${id}_url}[{links_operator_${id}_name}]</#macro>
<#macro ha id>link:{links_high-availability_${id}_url}[{links_high-availability_${id}_name}]</#macro>

View file

@ -31,7 +31,7 @@ public class Context {
for (File f : srcDir.listFiles((dir, f) -> f.endsWith(".adoc") && !f.equals("index.adoc"))) {
Guide guide = parser.parse(f);
if (guidePriorities != null) {
if (guidePriorities != null && guide != null) {
Integer priority = guidePriorities.get(guide.getId());
guide.setPriority(priority != null ? priority : Integer.MAX_VALUE);
}

View file

@ -41,7 +41,7 @@ public class GuideMojo extends AbstractMojo {
}
if (srcDir.getName().equals("images")) {
FileUtils.copyDirectory(srcDir, targetDir);
FileUtils.copyDirectoryStructure(srcDir, targetDir);
} else {
log.info("Guide dir: " + srcDir.getAbsolutePath());
log.info("Target dir: " + targetDir.getAbsolutePath());