Improving clustering guide and more information about cache-stack and custom stacks

Closes #12862
This commit is contained in:
Pedro Igor 2022-08-02 17:32:57 -03:00 committed by Bruno Oliveira da Silva
parent f7d258f333
commit a5dd9e985c

View file

@ -10,16 +10,16 @@ includedOptions="cache cache-*">
Keycloak is designed for high availability and multi-node clustered setups. Keycloak is designed for high availability and multi-node clustered setups.
The current distributed cache implementation is built on top of https://infinispan.org[Infinispan], a high-performance, distributable in-memory data grid. The current distributed cache implementation is built on top of https://infinispan.org[Infinispan], a high-performance, distributable in-memory data grid.
All available cache options are build options, so they need to be applied to a `build` of Keycloak before starting.
== Enable distributed caching == Enable distributed caching
When you start Keycloak in production mode, by using the `start` command, caching is enabled and all Keycloak nodes in your network are discovered. When you start Keycloak in production mode, by using the `start` command, caching is enabled and all Keycloak nodes in your network are discovered.
By default, caches are using a `UDP` transport stack so that nodes are discovered using IP multicast transport based on UDP. While not suitable
for most production deployments, the server allows you to choose other transport stacks as you will see later in this guide.
To explicitly enable distributed infinispan caching, enter this command: To explicitly enable distributed infinispan caching, enter this command:
<@kc.build parameters="--cache=ispn"/> <@kc.build parameters="--cache=ispn"/>
When you start Keycloak in development mode, by using the `start-dev` command, Keycloak uses only local caches, applying the `--cache=local` option. When you start Keycloak in development mode, by using the `start-dev` command, Keycloak uses only local caches and distributed caches are completely disabled by implicitly setting the `--cache=local` option.
The `local` cache mode is intended only for development and testing purposes. The `local` cache mode is intended only for development and testing purposes.
== Configuring caches == Configuring caches
@ -27,44 +27,6 @@ Keycloak provides a cache configuration file with sensible defaults located at `
The cache configuration is a regular https://infinispan.org/docs/stable/titles/configuring/configuring.html[Infinispan configuration file]. The cache configuration is a regular https://infinispan.org/docs/stable/titles/configuring/configuring.html[Infinispan configuration file].
=== Cache types and defaults
.Local caches
Keycloak caches persistent data locally to avoid unnecessary database requests.
The following caches are used:
* realms
* users
* authorization
* keys
.Invalidation of local caches
Local caching improves performance, but adds a challenge in multi-node setups.
When one Keycloak node updates data in the shared database, all other nodes need to be aware of it, so they invalidate that data from their caches.
The `work` cache is used for sending these invalidation messages.
.Authentication sessions
Authentication sessions are started when an unauthenticated user or service tries to log in to Keycloak.
The `authenticationSessions` distributed cache is used to save data during authentication of a particular user.
.User sessions
The following are the distributed caches for sessions of authenticated users and services:
* sessions
* clientSessions
* offlineSessions
* offlineClientSessions
These caches are used to save data about user sessions and clients attached to the sessions.
.Password brute force detection
The `loginFailures` distributed cache is used to track data about failed login attempts.
This cache is needed for the Brute Force Protection feature to work in a multi-node Keycloak setup.
.Action tokens
Action tokens are used for scenarios when a user needs to confirm an action asynchronously, for example in the emails sent by the forgot password flow.
The `actionTokens` distributed cache is used to track metadata about action tokens.
The following table gives an overview of the specific caches Keycloak uses. The following table gives an overview of the specific caches Keycloak uses.
You configure these caches in `conf/cache-ispn.xml`: You configure these caches in `conf/cache-ispn.xml`:
@ -75,24 +37,104 @@ You configure these caches in `conf/cache-ispn.xml`:
|authorization|Local|Cache persisted authorization data |authorization|Local|Cache persisted authorization data
|keys|Local|Cache external public keys |keys|Local|Cache external public keys
|work|Replicated|Propagate invalidation messages across nodes |work|Replicated|Propagate invalidation messages across nodes
|sessions|Distributed|Caches user sessions, when user is authenticated |authenticationSessions|Distributed|Caches authentication sessions, created/destroyed/expired during the authentication process
|authenticationSessions|Distributed|Caches authentication sessions, when user is authenticating |sessions|Distributed|Caches user sessions, created upon successful authentication and destroyed during logout, token revocation, or due to expiration
|offlineSessions|Distributed|Caches offline sessions |clientSessions|Distributed|Caches client sessions, created upon successful authentication to a specific client and destroyed during logout, token revocation, or due to expiration
|clientSessions|Distributed|Caches sessions for each client a user is authenticated with |offlineSessions|Distributed|Caches offline user sessions, created upon successful authentication and destroyed during logout, token revocation, or due to expiration
|offlineClientSessions|Distributed|Caches offline client sessions |offlineClientSessions|Distributed|Caches client sessions, created upon successful authentication to a specific client and destroyed during logout, token revocation, or due to expiration
|loginFailures|Distributed|keep track of failed logins, fraud detection |loginFailures|Distributed|keep track of failed logins, fraud detection
|actionTokens|Distributed|Caches action Tokens |actionTokens|Distributed|Caches action Tokens
|==== |====
Local caches for realms, users, and authorization are configured to have 10,000 entries per default. === Cache types and defaults
.Local caches
Keycloak caches persistent data locally to avoid unnecessary round-trips to the database.
The following data is kept local to each node in the cluster using local caches:
* *realms* and related data like clients, roles, and groups.
* *users* and related data like granted roles and group memberships.
* *authorization* and related data like resources, permissions, and policies.
* *keys*
Local caches for realms, users, and authorization are configured to hold up to 10,000 entries per default.
The local key cache can hold up to 1,000 entries per default and defaults to expire every one hour. The local key cache can hold up to 1,000 entries per default and defaults to expire every one hour.
Therefore, keys are forced to be periodically downloaded from external clients or identity providers. Therefore, keys are forced to be periodically downloaded from external clients or identity providers.
In order to achieve an optimal runtime and avoid additional round-trips to the database you should consider looking at
the configuration for each cache to make sure the maximum number of entries is aligned with the size of your database. More entries
you can cache, less often the server needs to fetch data from the database. You should evaluate the trade-offs between memory utilization and performance.
.Invalidation of local caches
Local caching improves performance, but adds a challenge in multi-node setups.
When one Keycloak node updates data in the shared database, all other nodes need to be aware of it, so they invalidate that data from their caches.
The `work` cache is a replicated cache and used for sending these invalidation messages. The entries/messages in this cache are very short-lived,
and you should not expect this cache growing in size over time.
.Authentication sessions
Authentication sessions are created whenever a user tries to authenticate. They are automatically destroyed once the authentication process
completes or due to reaching their expiration time.
The `authenticationSessions` distributed cache is used to store authentication sessions and any other data associated with it
during the authentication process.
By relying on a distributable cache, authentication sessions are available to any node in the cluster so that users can be redirected
to any node without losing their authentication state. However, production-ready deployments should always consider session affinity and favor redirecting users
to the node where their sessions were initially created. By doing that, you are going to avoid unnecessary state transfer between nodes and improve
CPU, memory, and network utilization.
.User sessions
Once the user is authenticated, a user session is created. The user session tracks your active users and their state so that they can seamlessly
authenticate to any application without being asked for their credentials again. For each application, the user authenticates with a client session
is created too, so that the server can track the applications the user is authenticated with and their state on a per-application basis.
User and client sessions are automatically destroyed whenever the user performs a logout, the client performs a token revocation, or due to reaching their expiration time.
The following caches are used to store both user and client sessions:
* sessions
* clientSessions
By relying on a distributable cache, user and client sessions are available to any node in the cluster so that users can be redirected
to any node without loosing their state. However, production-ready deployments should always consider session affinity and favor redirecting users
to the node where their sessions were initially created. By doing that, you are going to avoid unnecessary state transfer between nodes and improve
CPU, memory, and network utilization.
As an OpenID Connect Provider, the server is also capable of authenticating users and issuing offline tokens. Similarly to regular user and client sessions,
when an offline token is issued by the server upon successful authentication, the server also creates a user and client sessions. However, due to the nature
of offline tokens, offline sessions are handled differently as they are long-lived and should survive a complete cluster shutdown. Because of that, they are also persisted to the database.
The following caches are used to store offline sessions:
* offlineSessions
* offlineClientSessions
Upon a cluster restart, offline sessions are lazily loaded from the database and kept in a shared cache using the two caches above.
.Password brute force detection
The `loginFailures` distributed cache is used to track data about failed login attempts.
This cache is needed for the Brute Force Protection feature to work in a multi-node Keycloak setup.
.Action tokens
Action tokens are used for scenarios when a user needs to confirm an action asynchronously, for example in the emails sent by the forgot password flow.
The `actionTokens` distributed cache is used to track metadata about action tokens.
=== Configuring caches for availability
Distributed caches replicate cache entries on a subset of nodes in a cluster and assigns entries to fixed owner nodes.
Each distributed cache has two owners per default, which means that two nodes have a copy of the specific cache entries. Each distributed cache has two owners per default, which means that two nodes have a copy of the specific cache entries.
Non-owner nodes query the owners of a specific cache to obtain data. Non-owner nodes query the owners of a specific cache to obtain data.
When both owner nodes are offline, all data is lost. When both owner nodes are offline, all data is lost.
This situation usually leads to users being logged out at the next request and having to log in again. This situation usually leads to users being logged out at the next request and having to log in again.
The default number of owners is enough to survive 1 node (owner) failure in a cluster setup with at least three nodes. You are free
to change the number of owners accordingly to better fit into your availability requirements. To change the number of owners, open `conf/cache-ispn.xml` and change the value for `owners=<value>` for the distributed caches to your desired value.
=== Specify your own cache configuration file === Specify your own cache configuration file
To specify your own cache configuration file, enter this command: To specify your own cache configuration file, enter this command:
@ -147,6 +189,40 @@ To provide the dependencies to Keycloak, put the respective JAR in the `provider
<@kc.build parameters="--cache-stack=<ec2|google|azure>"/> <@kc.build parameters="--cache-stack=<ec2|google|azure>"/>
=== Custom transport stacks
If none of the available transport stacks are enough for your deployment, you are able to change your cache configuration file
and define your own transport stack.
For more details, see https://infinispan.org/docs/stable/titles/server/server.html#using-inline-jgroups-stacks_cluster-transport[Using inline JGroups stacks].
.defining a custom transport stack
[source]
----
<jgroups>
<stack name="my-encrypt-udp" extends="udp">
<SSL_KEY_EXCHANGE keystore_name="server.jks"
keystore_password="password"
stack.combine="INSERT_AFTER"
stack.position="VERIFY_SUSPECT"/>
<ASYM_ENCRYPT asym_keylength="2048"
asym_algorithm="RSA"
change_key_on_coord_leave = "false"
change_key_on_leave = "false"
use_external_key_exchange = "true"
stack.combine="INSERT_BEFORE"
stack.position="pbcast.NAKACK2"/>
</stack>
</jgroups>
<cache-container name="keycloak">
<transport lock-timeout="60000" stack="my-encrypt-udp"/>
...
</cache-container>
----
By default, the value set to the `cache-stack` option has precedence over the transport stack you define in the cache configuration file.
If you are defining a custom stack, make sure the `cache-stack` option is not used for the custom changes to take effect.
=== Securing cache communication === Securing cache communication
The current Infinispan cache implementation should be secured by various security measures such as RBAC, ACLs, and Transport stack encryption. For more information about securing cache communication, see the https://infinispan.org/docs/dev/titles/security/security.html#[Infinispan security guide]. The current Infinispan cache implementation should be secured by various security measures such as RBAC, ACLs, and Transport stack encryption. For more information about securing cache communication, see the https://infinispan.org/docs/dev/titles/security/security.html#[Infinispan security guide].