These steps are already automated for embedded Undertow, see Cross-DC tests section in [HOW-TO-RUN.md](../testsuite/integration-arquillian/HOW-TO-RUN.md) document. For Wildfly they are not yet automated.
Following instructions are related to Wildfly server.
QUESTION FROM MAREK: Is it better to use term "site" or term "datacenter" . In the whole docs, I am using both (but probably term "datacenter" a bit more).
Keycloak Cross-Datacenter uses JDG for the actual replication of infinispan data between the datacenters. So it's good to read and understand
JDG first. We use the `Remote Client-Server Mode` described in here - https://access.redhat.com/documentation/en-us/red_hat_jboss_data_grid/7.1/html/administration_and_configuration_guide/set_up_cross_datacenter_replication#configure_cross_datacenter_replication_remote_client_server_mode
Technical details
=================
Data
----
Keycloak is stateful application, which uses 2 main sources of data.
* Database - is used to persist permanent data (EG. informations about the users).
* Infinispan cache - is used to cache persistent data from DB and also to save some short-lived and often-changing metadata like user sessions.
Infinispan is usually much faster then database, however the data saved here are not permanent and usually they don't survive cluster restarts.
Assume you have 2 datacenters called `site1` and `site2` .
For the cross-datacenter setup, we need to make sure that both sources of data work reliably and Keycloak
servers from `site1` are eventually able to read the data saved by Keycloak servers on `site2` .
Based on the environment, you have some flexibility to decide if you prefer:
* Reliability - typically needed in Active/Active mode. Data written on `site1` need to be visible immediately on `site2`.
* Performance - typically in Active/Passive mode. Data written on `site1` doesn't need to be visible immediately on `site2`.
In some cases, they may not be visible on `site2` at all.
More details about this is in [Modes section](#modes).
Request processing
------------------
In typical scenario, end user's browser sends HTTP request to the [frontend loadbalancer server](http://www.keycloak.org/docs/latest/server_installation/index.html#_setting-up-a-load-balancer-or-proxy).
Loadbalancer is usually HTTPD or Wildfly with mod_cluster, NGinx, HA Proxy or other kind of software or hardware loadbalancer.
Loadbalancer then forwards HTTP requests to the underlying Keycloak instances, which can be spread among
multiple datacenters (sites). Loadbalancers typically offer support for [sticky sessions](http://www.keycloak.org/docs/latest/server_installation/index.html#sticky-sessions),
which means that loadbalancer is able to forward HTTP requests from one user always to the same Keycloak instance in same datacenter.
There are also HTTP requests, which are sent from client applications to the loadbalancer. Those HTTP requests are `backchannel requests`.
They are not seen by end user's browser and can't be part of sticky session between user and loadbalancer. Hence loadbalancer can forward
the particular HTTP request to any Keycloak instance in any datacenter. This is challenging as some OpenID Connect or SAML flows require
multiple HTTP requests from both user and application. Because we can't reliably rely on sticky sessions, it means that some data need to be
replicated between datacenters, so they are seen by subsequent HTTP requests during particular flow.
Modes
-----
According your requirements, there are 2 basic operating modes for the cross-dc setup:
* Active/Passive - Here the users and client applications send the requests just to the Keycloak nodes in single datacenter.
The second datacenter is used just as a `backup` for saving the data. In case of the failure in the main datacenter,
the data can be usually restored from the second datacenter.
* Active/Active - Here the users and client applications send the requests to the Keycloak nodes in both datacenters.
It means that data need to be visible immediately on both sites and available to be consumed immediately from Keycloak servers on both sites.
Especially if Keycloak server writes some data on `site1`, it is required that the data are available immediately for reading
for Keycloak servers on `site2` at the time when the write on `site1` is finished.
The active/passive mode is better for performance. More info about how to configure caches for both modes will follow
in the [sync or async backups section](#sync-or-async-backups).
Database
--------
Keycloak uses RDBMS to persist some metadata about realms, clients, users etc. See [this chapter](http://www.keycloak.org/docs/latest/server_installation/index.html#_database)
for more details. In cross-datacenter setup, we assume that either both datacenters talk to same database or every datacenter
has it's own database node and both database nodes are synchronously replicated. In both cases, it's required that when Keycloak server
on `site1` persists some data and commits the transaction, those data are immediately visible by subsequent DB transactions on `site2`.
Details of DB setup are out-of-scope of Keycloak, however note that many RDBMS vendors like PostgreSQL, MariaDB or Oracle offers
replicated databases and synchronous replication. We tested Keycloak with those vendors:
TODO: Details about MariaDB and Oracle RAC versions etc.
Infinispan caches
-----------------
Here are some overview about the infinispan caches. More details about the details of the cache setup will follow later.
**Authentication sessions**
In Keycloak we have concept of authentication sessions. There is separate infinispan cache `authenticationSessions` used to save data during
authentication of particular user. Requests from this cache usually involves just browser and Keycloak server, not the application. Hence we can
rely on sticky sessions and `authenticationSessions` cache content doesn't need to be replicated among datacenters even if you are in Active/Active mode.
**Action tokens**
We have concept of [action tokens](http://www.keycloak.org/docs/latest/server_development/index.html#_action_token_spi), which
are used typically for scenarios when user needs to confirm some actions asynchronously by email.
For example during `forget password` flow. The `actionTokens` infinispan cache is used to track metadata about action tokens
(eg. which action token was already used, so it can't be reused second time) and it usually needs to be replicated between datacenters.
**Caching and invalidation of persistent data**
Keycloak uses infinispan for cache persistent data to avoid many unecessary requests to the database.
Caching is great for save performance, however there is one additional challenge, that when some Keycloak
server updates any data, all other Keycloak servers in all datacenters need to be aware of it, so they
invalidate particular data from their caches. Keycloak uses local infinispan caches called `realms`, `users`
and `authorization` to cache persistent data.
We use separate cache `work`, which is replicated among all datacenters. The work cache itself doesn't cache
any real data. It is defacto used just for sending invalidation messages between cluster nodes and datacenters.
In other words, when some data is updated (eg. user `john` is updated), the particular Keycloak node sends
the invalidation message to all other cluster nodes in same datacenter and also to all other datacenters.
Every node then invalidates particular data from their local cache once it receives the invalidation message.
**User sessions**
There are infinispan caches `sessions`, `clientSessions`, `offlineSessions` and `offlineClientSessions`,
which usually need to be replicated between datacenters. Those caches are used to save data about user
sessions, which are valid for the whole life of one user's browser session. The caches need to deal with
the HTTP requests from the end user and from the application. As described above, sticky session can't be
always reliably used, but we still want to ensure that subsequent HTTP requests can see the latest data.
Hence the data are usually replicated between datacenters.
**Brute force protection**
Finally `loginFailures` cache is used to track data about failed logins (eg. how many times user `john`
filled the bad password on username/password screen etc). The details are described [here](http://www.keycloak.org/docs/latest/server_admin/index.html#password-guess-brute-force-attacks) .
It is up to the admin if he wants this cache to be replicated between datacenters. To have accurate count of login failures,
the replication is needed. On the other hand, avoid replicating this data can save some performance. So if performance is
more important then accurate counts of login failures, the replication can be avoided.
More details about how can be caches configured is [in this section](#tuning-jdg-cache-configuration) .
Communication details
---------------------
Under the covers, there are multiple separate infinispan clusters here. Every Keycloak node is in the cluster
with the other Keycloak nodes in same datacenter, but not with the Keycloak nodes in different datacenters.
Keycloak node doesn't communicate directly with the Keycloak nodes from different datacenters. Keycloak nodes use external JDG
(or infinispan server) for communication between datacenters. This is done
through the [Infinispan HotRod protocol](http://infinispan.org/docs/8.2.x/user_guide/user_guide.html#using_hot_rod_server) .
The infinispan caches on Keycloak side needs to be configured with the [remoteStore](http://infinispan.org/docs/8.2.x/user_guide/user_guide.html#remote_store),
to ensure that data are saved to the remote cache, which uses HotRod protocol under the covers. There is separate infinispan cluster
between JDG servers, so the data saved on JDG1 on `site1` are replicated to JDG2 on `site2` .
Finally the receiver JDG server then notifies Keycloak servers in it's cluster through the Client Listeners, which is a feature of
HotRod protocol. Keycloak nodes on `site2` then update their infinispan caches and particular user session is visible on Keycloak nodes on
site 2 too.
See the picture in [intro section](#documentation-intro) for more details.
* Infinispan servers `jdg1` and `jdg2` are connected with each other through the RELAY2 protocol and `backup` based infinispan caches in
similar way as described in the [JDG documentation](https://access.redhat.com/documentation/en-us/red_hat_jboss_data_grid/7.1/html-single/administration_and_configuration_guide/#configure_cross_datacenter_replication_remote_client_server_mode) .
They communicate with infinispan server `jdg1` through the HotRod protocol (Remote cache). See [previous section](#communication-details) for the details.
NOTE: Details about the configuration options inside `replicated-cache-configuration` are explained in [later section](#tuning-jdg-cache-configuration). Also
with possibilities to tweak some of those options.
but they are just connected through the RELAY2 protocol and TCP JGroups stack is used for communication between them. So the startup command is like this:
9.3) Open 2nd browser and go to any of nodes `http://localhost:12080/auth/admin` or `http://localhost:13080/auth/admin` or `http://localhost:14080/auth/admin` . After login, you should be able to see
the same sessions in tab `Sessions` of particular user, client or realm on all 4 servers
2017-08-25 17:35:17,737 DEBUG [org.keycloak.models.sessions.infinispan.remotestore.RemoteCacheSessionListener] (Client-Listener-sessions-30012a77422542f5) Received event from remote store.
1) Site `site2` is entirely offline from the `site1` perspective. It means that all JDG servers on `site2` are off *or* the network between `site1` and `site2` is broken.
2) You run Keycloak servers and JDG server `jdg1` in site `site1`
3) Someone login on some Keycloak server on `site1`.
4) The Keycloak server from `site1` will try to write the session to the remote
cache on `jdg1` server, which is supposed to backup data to the `jdg2` server in the `site2`. See [this section](#communication-details) for
the details.
5) Server `jdg2` is offline or unreachable from `jdg1`. So the backup from `jdg1` to `jdg2` will fail.
6) The exception is thrown in `jdg1` log and the failure will be propagated from `jdg1` server to Keycloak servers as well because
the default `FAIL` backup failure policy is configured. See [this section](#backup-failure-policy) for details around the backup policies.
7) The error will happen on Keycloak side too and user may not be able to finish his login.
According to your environment, it may be more or less probable that the network between sites is unavailable or temporarily broken (split-brain).
In case that this will happen, it's good that JDG servers on `site1` are aware of the fact that JDG servers on `site2` are
unavailable (In other words, that `site2` is offline), so they will stop
trying to reach servers in `jdg2` site and the backup failures won't happen. This is called `Take site offline` .
Take site offline
-----------------
There are 2 ways to take the site offline.
1) **Manually by admin** - Admin can use the `jconsole` or other tool and run some JMX operations to manually take the particular site offline.
This is useful especially if the outage is planned. With `jconsole` or CLI, you can connect to the `jdg1` server and take the `site2` offline.
More details about this is
in the [JDG documentation](https://access.redhat.com/documentation/en-us/red_hat_jboss_data_grid/7.1/html/administration_and_configuration_guide/set_up_cross_datacenter_replication#taking_a_site_offline)
Once the sites are put online, it's usually good to:
* Do the [state transfer](#state-transfer)
* Manually [clear the Keycloak caches](#clear-caches) .
State transfer
-------------
State transfer is manually required step. JDG doesn't do this automatically as
for example during split-brain, it's just the admin who may need to decide which site has preference and hence if state-transfer
needs to be done bi-directionaly between both sites or just unidirectionally (EG. just from `site1` to `site2`, but not from `site2` to `site1`).
During bi-directional state transfer, it will ensure that entities, which were created *after* split-brain on `site1` will be transferred
to `site2` . This is no issue as they don't exist yet on `site2` . Similarly entities created *after* split-brain on `site2` will be transferred
to `site1` . Possible problematic parts are the entities, which exists *before* split brain on both sites and which were updated during split-brain
on both sites. In that case one of the site will *win* and will overwrite the updates done during split-brain by the second part.
Unfortunately there is no any universal solution to this. Split-brains and network outages are just state, which is usually impossible to be handled 100%
correctly with 100% consistent data between sites. For the case of Keycloak, it typically is not critical issue. In worst case, users
will need to re-login again to their clients. Or have the improper count of loginFailures tracked for brute force protection. See JDG/JGroups/Infinispan
The state transfer can be done again on the JDG side through JMX. Operation name is `pushState` . There are few other operations to monitor status, cancel push state etc.
More info about state transfer is in JDG docs - https://access.redhat.com/documentation/en-us/red_hat_jboss_data_grid/7.1/html/administration_and_configuration_guide/set_up_cross_datacenter_replication#state_transfer_between_sites
Clear caches
------------
After split-brain it's also safe to manually clear caches in Keycloak admin console. Reason is, that there might be some data changed in DB
on `site1` and the event, that cache should be invalidated wasn't transferred during split-brain to `site2` .
Hence Keycloak nodes on `site2` may still have some stale data in their caches.
To clear the caches, take a look at http://www.keycloak.org/docs/latest/server_admin/index.html#_clear-cache .
When network is back, it's sufficient to clear the cache just on one Keycloak node on any random site.
The event about cache invalidation will be sent to all the other Keycloak nodes in all sites. However it needs
to be done for all the caches (realms, users, keys).
Tuning JDG cache configuration
==============================
Backup failure policy
---------------------
By default, the configuration of backup `failure-policy` in the infinispan cache configuration in JDG `clustered.xml`
file is configured as `FAIL` . According your preferences, you may change it to `WARN` or `IGNORE` .
The difference between `FAIL` and `WARN` is, that when JDG server tries to backup data to the other site and the backup fails (EG. second site
is temporarily unreachable or there is concurrent transaction, which is trying to update same entity),
then the failure will be propagated back to the caller (Keycloak server) if the `FAIL` policy is used.
The Keycloak server will then try to retry the particular operation few times. However if the second site is really unavailable,
the retry will fail too and the user might see the error after some longer timeout (EG. 1 minute).
With `WARN` policy, the failed backups are not propagated from JDG server to the Keycloak server. User won't see the error and the
failed backup will be just ignored. There will just be some shorter timeout,
typically 10 seconds as that's the default timeout for backup. It can be changed by the attribute `timeout` of `backup` element.
There won't be retries. There will just be the WARNING message in the JDG server log.
The potential issue is, that in some cases, there may be just some very short network outage between sites, where the retry
(usage of the `FAIL` policy) may help, so with `WARN` (without retry), there will be some data inconsistencies between sites.
This can also happen if there is an attempt to update same entity concurrently on both sites.
The question is, how bad inconsistencies are. Usually it means that user just need to re-authenticate.
With `WARN` policy, it may happen that single-use cache, which is provided by the `actionTokens` cache and which handles that
particular key is really single
use, may "successfully" write the same key twice. But for example the OAuth2 specification mentions that code must be single-use.
See [here](https://tools.ietf.org/html/rfc6749#section-10.5) .
With the `WARN` policy, this may not be strictly guaranteed and the same code could be written twice if there is an attempt to write
it concurrently in both sites.
If there is real longer network outage or split-brain, then with both `FAIL` and `WARN`, the other site will be taken offline after some
time and failures as described [here](#take-site-offline) . With the default 1 minute timeout, it is usually after 1-3 minutes
until all the involved caches are taken offline. Then all the operations will work fine from the end user perspective.
You just need to manually restore the site when it's back online as mentioned [here](#take-site-online) .
In summary, if you expect often longer outages between sites and it's acceptable for you to have some data inconsistencies and
not 100% accurate single-use cache, but you never want end-users to see the errors and long timeouts, then switch to `WARN` .
The difference between `WARN` and `IGNORE` is, that with `IGNORE` there are even no warnings in the JDG log. See more details in the Infinispan
documentation.
Lock acquisition timeout
------------------------
The default configuration is using transaction in NON_DURABLE_XA mode with acquire timeout 0. This means that
transaction will fail-fast if there is other transaction in progress for same key.
The reason for switch this to 0 instead of default 10 seconds was to avoid possible deadlock issues. With Keycloak,
it can happen that same entity (typically session entity or loginFailure) is updated concurrently from both sites.
This can cause deadlock under some circumstances, which will cause the transaction blocked for 10 seconds. See [this
With timeout 0, the transaction will immediately fail and then will be retried from Keycloak if backup `failure-policy` with
the value `FAIL` is configured. As long as the second concurrent transaction is finished, the retry will be usually successful and entity
will have applied updates from both concurrent transactions.
We see very good consistency and results for concurrent transaction with this configuration, so at least for now is
recommended to keep it.
The only (non-functional) problem is the exception in the JDG log, which happen every time when the lock is not
immediately available.
SYNC or ASYNC backups
---------------------
One important note on the `backup` element is a `strategy` attribute and decide whether it needs to be `SYNC` or `ASYNC` . Actually we have
7 caches, which might be cross-dc aware, and those can be configured in 3 different modes regarding cross-dc:
1) SYNC backup
2) ASYNC backup
3) No backup at all
If the `SYNC` backup is used, then the backup is synchronous and operation is considered finished on the caller (Keycloak server) side
once the backup is processed on the second site. This has worse performance than `ASYNC`, but on the other hand, you are sure that subsequent reads
of the particular entity (EG. user session) on `site2` will see the updates from `site1` . Also it's needed if you want data
consistency as with `ASYNC` the caller is not notified at all if backup to the other site failed.
For some caches, it's even possible to not backup at all and completely skip writing data to the JDG server. For setup this, you can avoid
to use `remote-store` element for the particular cache on the Keycloak side (file `KEYCLOAK_HOME/standalone/configuration/standalone-ha.xml`) and
the particular `replicated-cache` element is also not needed on the JDG side then.
By default, all 7 caches are configured with `SYNC` backup, which is the safest option. Few things to consider:
* If you are using active/passive mode (all Keycloak servers are in single site `site1` and the JDG server in `site2` is used purely as
backup. More details [here](#modes)), then it's usually fine to use `ASYNC` strategy for all the caches to save the performance.
* The `work` cache is used mainly to send some messages (EG. cache invalidation events) to the other site. It's also used to ensure that some
special events (EG. userStorage synchronizations) happen just on single site. It's recommended to keep it in `SYNC` strategy.
* The `actionTokens` cache is used as single-use cache to track that some tokens/tickets were used just once. For example
[Action tokens](#infinispan-caches) or OAuth2 codes. It's possible to switch it to `ASYNC` to save some performance, but then it's not
guaranteed that particular ticket is really single-use. For example if there is concurrent request for same ticket in both sites, then
it's possible that both requests will be successful with the `ASYNC` strategy. So it depends if you prefer better
security (`SYNC` strategy) or better performance (`ASYNC` strategy).
* The `loginFailures` cache may be possibly used in all 3 modes. If there is no backup at all, it means that count of login failures for user
(See [here](#infinispan-caches) for details) will be counted separately for every site. This has some security implications,
however it has some performance advantages. Also it mitigates the possible risk of DoS. For example if attacker
simulates 1000 concurrent requests of trying username/password of the user on both sites, it will mean lots of the messages
between the sites, which may result in network congestion. The `ASYNC` strategy might be even worse as the attacker
requests won't be blocked by waiting for the backup to the other site, resulting in potentially even bigger network traffic.
The count of login failures also won't be accurate with the `ASYNC` strategy.
For the environments with slower network between datacenters and probability of DoS, it's recommended to not backup `loginFailures` cache at all.
* Caches `sessions` and `clientSessions` are usually recommended to keep in `SYNC` strategy. Switching them to `ASYNC` strategy is possible just
if you are sure that user requests and backchannel requests (requests from client applications to Keycloak as described [here](#request-processing))
will be always processed on same site. This is true for example if:
* You use active/passive mode as described [here](#modes).
* All your client applications are using Keycloak [Javascript Adapter](http://www.keycloak.org/docs/latest/securing_apps/index.html#_javascript_adapter).
Javascript adapter sends the backchannel requests within browser and hence they participate on the browser sticky session and
will end on same cluster node (hence on same site) as the other browser requests of this user.
* Loadbalancer is able to serve the requests based on client IP address (location) and the client applications are deployed on both sites.
For example you have 2 sites LON and NYC. As long as your applications are deployed in both LON and NYC sites too, you can ensure
that all the user requests from London users will be redirected to the applications in LON site and also to the Keycloak servers in LON site.
Backchannel requests from the LON site client deployments will end on Keycloak servers in LON site too. On the other hand, for the American
users, all the Keycloak requests, application requests and backchannel requests will be processed on NYC site.
* For `offlineSessions` and `offlineClientSessions` it's similar. With the difference, that you even don't need to backup them at all
if you never plan to use offline tokens for any of your client applications.
Generally, if you are in doubt and the performance is not a blocker for you, it's safer to keep the caches in `SYNC` strategy.
WARNING: Regarding the switch to SYNC/ASYNC backup, make sure that you edit the `strategy` attribute of the the `backup` element. For example
* If there are exceptions during startup of Keycloak server like:
```
17:33:58,605 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation] (ServerService Thread Pool -- 59) ISPN004007: Exception encountered. Retry 10 out of 10: org.infinispan.client.hotrod.exceptions.TransportException:: Could not fetch transport
...
Caused by: org.infinispan.client.hotrod.exceptions.TransportException:: Could not connect to server: 127.0.0.1:12232
at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.<init>(TcpTransport.java:82)
```
it usually means that Keycloak server is not able to reach the JDG server in his own datacenter. Make sure that
firewall is set as expected and JDG server is possible to connect.
* If there are exceptions during startup of Keycloak server like:
```
16:44:18,321 WARN [org.infinispan.client.hotrod.impl.protocol.Codec21] (ServerService Thread Pool -- 57) ISPN004005: Error received from the server: javax.transaction.RollbackException: ARJUNA016053: Could not commit transaction.
...
```
then it's good to check the log of corresponding JDG server of our site and check if it doesn't failed to backup
to the other site. If the backup site is unavailable, then it's recommended to switch it offline, so that JDG server
won't try to backup to the offline site and hence the operations will pass successfully on Keycloak server side as well.
More details are described in [this section](#administration-of-cross-dc-deployment) .
* Check the infinispan statistics, which are again available through JMX. For example, you can try to login and then see if the new session
was successfully written to both JDG servers and is available in the `sessions` cache there. This can be done indirectly by checking
the count of elements in the `sessions` cache for the MBean `jboss.datagrid-infinispan:type=Cache,name="sessions(repl_sync)",manager="clustered",component=Statistics` .
and attribute `numberOfEntries` . After login, there should be one more entry for `numberOfEntries` on both JDG servers on both sites.
* Enable DEBUG logging as described [here](#keycloak-servers-setup) . For example if you login and you think that the new session is not
available on the second site, it's good to see the Keycloak server logs and check that listeners were triggered as described in
the [the setup section](#keycloak-servers-setup). If you don't know and want to ask on keycloak-user mailing list, it's good to send the log
files from Keycloak servers on both datacenters to the email. Either add the log snippets to the mails or put the logs somewhere and reference them from mail
to avoid put big attachements to the mail sent to the mailing list.
* If you updated the entity (EG. user) on Keycloak server on `site1` and you don't see that entity updated on the Keycloak server on `site2`, then
the issue can be either in the replication of the synchronous database itself or just that Keycloak caches are not properly invalidated. You may
try to temporarily disable the Keycloak caches as described [here](http://www.keycloak.org/docs/latest/server_installation/index.html#disabling-caching)