[[crossdc-mode]] === Using cross-site replication mode :tech_feature_name: cross-site replication mode :tech_feature_disabled: false include::../templates/techpreview.adoc[] Use cross-site replication mode to run {project_name} in a cluster across multiple data centers. Typically you use data center sites that are in different geographic regions. When using this mode, each data center will have its own cluster of {project_name} servers. This documentation will refer to the following example architecture diagram to illustrate and describe a simple cross-site replication use case. [[archdiagram]] .Example Architecture Diagram image:{project_images}/cross-dc-architecture.png[] [[prerequisites]] ==== Prerequisites As this is an advanced topic, we recommend you first read the following, which provide valuable background knowledge: * link:{installguide_clustering_link}[Clustering with {project_name}] When setting up for cross-site replication, you will use more independent {project_name} clusters, so you must understand how a cluster works and the basic concepts and requirements such as load balancing, shared databases, and multicasting. ifeval::[{project_product}==true] * link:{jdgserver_crossdcdocs_link}[Red Hat Data Grid Cross-Site Replication] {project_name} uses Red Hat Data Grid (RHDG) for the replication of data between the data centers. endif::[] ifeval::[{project_community}==true] * link:https://infinispan.org/docs/11.0.x/titles/xsite/xsite.html#xsite_replication[Infinispan Cross-Site Replication] replicates data across clusters in separate geographic locations. endif::[] [[technicaldetails]] ==== Technical details This section provides an introduction to the concepts and details of how {project_name} cross-site replication is accomplished. .Data {project_name} is stateful application. It uses the following as data sources: * A database is used to persist permanent data, such as user information. * An Infinispan cache is used to cache persistent data from the database and also to save some short-lived and frequently-changing metadata, such as for user sessions. Infinispan is usually much faster than a database, however the data saved using Infinispan are not permanent and is not expected to persist across cluster restarts. In our example architecture, there are two data centers called `site1` and `site2`. For cross-site replication, we must make sure that both sources of data work reliably and that {project_name} servers from `site1` are eventually able to read the data saved by {project_name} servers on `site2` . Based on the environment, you have the option to decide if you prefer: * Reliability - which is typically used in Active/Active mode. Data written on `site1` must be visible immediately on `site2`. * Performance - which is typically used in Active/Passive mode. Data written on `site1` does not need to be visible immediately on `site2`. In some cases, the data may not be visible on `site2` at all. For more details, see <>. [[requestprocessing]] ==== Request processing An end user's browser sends an HTTP request to the link:{installguide_loadbalancer_link}[front end load balancer]. This load balancer is usually HTTPD or WildFly with mod_cluster, NGINX, HA Proxy, or perhaps some other kind of software or hardware load balancer. The load balancer then forwards the HTTP requests it receives to the underlying {project_name} instances, which can be spread among multiple data centers. Load balancers typically offer support for link:{installguide_stickysessions_link}[sticky sessions], which means that the load balancer is able to always forward all HTTP requests from the same user to the same {project_name} instance in same data center. HTTP requests that are sent from client applications to the load balancer are called `backchannel requests`. These are not seen by an end user's browser and therefore can not be part of a sticky session between the user and the load balancer. For backchannel requests, the loadbalancer can forward the HTTP request to any {project_name} instance in any data center. This is challenging as some OpenID Connect and some SAML flows require multiple HTTP requests from both the user and the application. Because we can not reliably depend on sticky sessions to force all the related requests to be sent to the same {project_name} instance in the same data center, we must instead replicate some data across data centers, so the data are seen by subsequent HTTP requests during a particular flow. [[modes]] ==== Modes According your requirements, there are two basic operating modes for cross-site replication: * Active/Passive - Here the users and client applications send the requests just to the {project_name} nodes in just a single data center. The second data center is used just as a `backup` for saving the data. In case of the failure in the main data center, the data can be usually restored from the second data center. * Active/Active - Here the users and client applications send the requests to the {project_name} nodes in both data centers. It means that data need to be visible immediately on both sites and available to be consumed immediately from {project_name} servers on both sites. This is especially true if {project_name} server writes some data on `site1`, and it is required that the data are available immediately for reading by {project_name} servers on `site2` immediately after the write on `site1` is finished. The active/passive mode is better for performance. For more information about how to configure caches for either mode, see: <>. [[database]] ==== Database {project_name} uses a relational database management system (RDBMS) to persist some metadata about realms, clients, users, and so on. See link:{installguide_database_link}[this chapter] of the server installation guide for more details. In a cross-site replication setup, we assume that either both data centers talk to the same database or that every data center has its own database node and both database nodes are synchronously replicated across the data centers. In both cases, it is required that when a {project_name} server on `site1` persists some data and commits the transaction, those data are immediately visible by subsequent DB transactions on `site2`. Details of DB setup are out-of-scope for {project_name}, however many RDBMS vendors like MariaDB and Oracle offer replicated databases and synchronous replication. We test {project_name} with these vendors: * Oracle Database 19c RAC * Galera 3.12 cluster for MariaDB server version 10.1.19-MariaDB [[cache]] ==== Infinispan caches This section begins with a high level description of the Infinispan caches. More details of the cache setup follow. .Authentication sessions In {project_name} we have the concept of authentication sessions. There is a separate Infinispan cache called `authenticationSessions` used to save data during authentication of particular user. Requests from this cache usually involve only a browser and the {project_name} server, not the application. Here we can rely on sticky sessions and the `authenticationSessions` cache content does not need to be replicated across data centers, even if you are in Active/Active mode. ifeval::[{project_community}==true] .Action tokens We also have the concept of link:{developerguide_actiontoken_link}[action tokens], which are used typically for scenarios when the user needs to confirm an action asynchronously by email. For example, during the `forget password` flow the `actionTokens` Infinispan cache is used to track metadata about related action tokens, such as which action token was already used, so it can't be reused second time. This usually needs to be replicated across data centers. endif::[] .Caching and invalidation of persistent data {project_name} uses Infinispan to cache persistent data to avoid many unnecessary requests to the database. Caching improves performance, however it adds an additional challenge. When some {project_name} server updates any data, all other {project_name} servers in all data centers need to be aware of it, so they invalidate particular data from their caches. {project_name} uses local Infinispan caches called `realms`, `users`, and `authorization` to cache persistent data. We use a separate cache, `work`, which is replicated across all data centers. The work cache itself does not cache any real data. It is used only for sending invalidation messages between cluster nodes and data centers. In other words, when data is updated, such as the user `john`, the {project_name} node sends the invalidation message to all other cluster nodes in the same data center and also to all other data centers. After receiving the invalidation notice, every node then invalidates the appropriate data from their local cache. .User sessions There are Infinispan caches called `sessions`, `clientSessions`, `offlineSessions`, and `offlineClientSessions`, all of which usually need to be replicated across data centers. These caches are used to save data about user sessions, which are valid for the length of a user's browser session. The caches must handle the HTTP requests from the end user and from the application. As described above, sticky sessions can not be reliably used in this instance, but we still want to ensure that subsequent HTTP requests can see the latest data. For this reason, the data are usually replicated across data centers. .Brute force protection Finally the `loginFailures` cache is used to track data about failed logins, such as how many times the user `john` entered a bad password. The details are described link:{adminguide_bruteforce_link}[here]. It is up to the admin whether this cache should be replicated across data centers. To have an accurate count of login failures, the replication is needed. On the other hand, not replicating this data can save some performance. So if performance is more important than accurate counts of login failures, the replication can be avoided. For more detail about how caches can be configured see <>. [[communication]] ==== Communication details {project_name} uses multiple, separate clusters of Infinispan caches. Every {project_name} node is in the cluster with the other {project_name} nodes in same data center, but not with the {project_name} nodes in different data centers. A {project_name} node does not communicate directly with the {project_name} nodes from different data centers. {project_name} nodes use external JDG (actually {jdgserver_name} servers) for communication across data centers. This is done using the link:https://infinispan.org/docs/10.1.x/titles/server/server.html#hot_rod[Infinispan HotRod protocol]. The Infinispan caches on the {project_name} side must be configured with the link:https://infinispan.org/docs/10.1.x/titles/configuring/configuring.html#remote_cache_store[remoteStore] to ensure that data are saved to the remote cache. There is separate Infinispan cluster between JDG servers, so the data saved on JDG1 on `site1` are replicated to JDG2 on `site2` . The receiving {jdgserver_name} server notifies the {project_name} servers in its cluster through Client Listeners, which are a feature of the Hot Rod protocol. {project_name} nodes on `site2` then update their Infinispan caches and the particular user session is also visible on {project_name} nodes on `site2`. See the <> for more details. include::crossdc/assembly-setting-up-crossdc.adoc[leveloffset=3] [[setup]] ==== Setting up cross-site replication with {jdgserver_name} {jdgserver_version} This example for {jdgserver_name} {jdgserver_version} involves two data centers, `site1` and `site2`. Each data center consists of 1 {jdgserver_name} server and 2 {project_name} servers. We will end up with 2 {jdgserver_name} servers and 4 {project_name} servers in total. * `Site1` consists of {jdgserver_name} server, `server1`, and 2 {project_name} servers, `node11` and `node12` . * `Site2` consists of {jdgserver_name} server, `server2`, and 2 {project_name} servers, `node21` and `node22` . * {jdgserver_name} servers `server1` and `server2` are connected to each other through the RELAY2 protocol and `backup` based {jdgserver_name} caches in a similar way as described in the link:{jdgserver_crossdcdocs_link}[{jdgserver_name} documentation]. * {project_name} servers `node11` and `node12` form a cluster with each other, but they do not communicate directly with any server in `site2`. They communicate with the Infinispan server `server1` using the Hot Rod protocol (Remote cache). See <> for the details. * The same details apply for `node21` and `node22`. They cluster with each other and communicate only with `server2` server using the Hot Rod protocol. Our example setup assumes all that all 4 {project_name} servers talk to the same database. In production, it is recommended to use separate synchronously replicated databases across data centers as described in <>. [[jdgsetup]] ===== Setting up the {jdgserver_name} server Follow these steps to set up the {jdgserver_name} server: . Download {jdgserver_name} {jdgserver_version} server and unzip to a directory you choose. This location will be referred in later steps as `SERVER1_HOME` . . Change those things in the `SERVER1_HOME/server/conf/infinispan-xsite.xml` in the configuration of JGroups subsystem: .. Add the `xsite` channel, which will use `tcp` stack, under `channels` element: + ```xml ``` + .. Add a `relay` element to the end of the `udp` stack. We will configure it in a way that our site is `site1` and the other site, where we will backup, is `site2`: + ```xml ... false ``` + .. Configure the `tcp` stack to use `TCPPING` protocol instead of `MPING`. Remove the `MPING` element and replace it with the `TCPPING`. The `initial_hosts` element points to the hosts `server1` and `server2`: + ```xml server1[7600],server2[7600] false ... ``` NOTE: This is just an example setup to have things quickly running. In production, you are not required to use `tcp` stack for the JGroups `RELAY2`, but you can configure any other stack. For example, you could use the default udp stack, if the network between your data centers is able to support multicast. Just make sure that the {jdgserver_name} and {project_name} clusters are mutually indiscoverable. Similarly, you are not required to use `TCPPING` as discovery protocol. And in production, you probably won't use `TCPPING` due it's static nature. Finally, site names are also configurable. Details of this more-detailed setup are out-of-scope of the {project_name} documentation. See the {jdgserver_name} documentation and JGroups documentation for more details. + . Add this into `SERVER1_HOME/standalone/configuration/clustered.xml` under cache-container named `clustered`: + ```xml ... ifeval::[{project_product}==true] endif::[] ``` + NOTE: Details about the configuration options inside `replicated-cache-configuration` are explained in <>, which includes information about tweaking some of those options. + ifeval::[{project_community}==true] WARNING: Unlike in previous version, the {jdgserver_name} server `replicated-cache-configuration` needs to be configured without `transaction` element. See <> for more details. endif::[] + . Some {jdgserver_name} server releases require authorization before accessing protected caches over network. + NOTE: You should not see any issue if you use recommended {jdgserver_name} {jdgserver_version} server and this step can (and should) be ignored. Issues related to authorization may exist just for some other versions of {jdgserver_name} server. + {project_name} requires updates to `___script_cache` cache containing scripts. If you get errors accessing this cache, you will need to set up authorization in `clustered.xml` configuration as described below: + .. In the `` section, add a security realm: + ```xml ... not-so-secret-password ``` .. In the server core subsystem, add `` as below: + ```xml ... ``` .. In the endpoint subsystem, add authentication configuration to Hot Rod connector: + ```xml ... ``` + . Copy the server to the second location, which will be referred to later as `SERVER2_HOME`. . In the `SERVER2_HOME/standalone/configuration/clustered.xml` exchange `site1` with `site2` and vice versa, both in the configuration of `relay` in the JGroups subsystem and in configuration of `backups` in the cache-subsystem. For example: .. The `relay` element should look like this: + ```xml false ``` + .. The `backups` element like this: + ```xml _server1:site1 site2 --> _server2:site2 ``` When you use the MBean `jgroups:type=protocol,cluster="cluster",protocol=GMS`, you should see that the attribute member contains just single member: .. On `SERVER1` it should be like this: + ``` (1) server1 ``` + .. And on SERVER2 like this: + ``` (1) server2 ``` + NOTE: In production, you can have more {jdgserver_name} servers in every data center. You just need to ensure that {jdgserver_name} servers in same data center are using the same multicast address (In other words, the same `jboss.default.multicast.address` during startup). Then in jconsole in `GMS` protocol view, you will see all the members of current cluster. [[serversetup]] ===== Setting up {project_name} servers . Unzip {project_name} server distribution to a location you choose. It will be referred to later as `NODE11`. . Configure a shared database for KeycloakDS datasource. It is recommended to use MySQL or MariaDB for testing purposes. See <> for more details. + In production you will likely need to have a separate database server in every data center and both database servers should be synchronously replicated to each other. In the example setup, we just use a single database and connect all 4 {project_name} servers to it. + . Edit `NODE11/standalone/configuration/standalone-ha.xml` : .. Add the attribute `site` to the JGroups UDP protocol: + ```xml ``` + .. Add the `remote-store` under `work` cache: + ```xml true org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory ifeval::[{project_product}==true] 2.6 endif::[] ifeval::[{project_community}==true] 2.9 endif::[] ``` + .. Add the `remote-store` like this under `sessions` cache: + ```xml true org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory ifeval::[{project_product}==true] 2.6 endif::[] ifeval::[{project_community}==true] 2.9 endif::[] ``` + .. Do the same for `offlineSessions`, `clientSessions`, `offlineClientSessions`, `loginFailures`, and `actionTokens` caches (the only difference from `sessions` cache is that `cache` property value are different): + ```xml true org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory ifeval::[{project_product}==true] 2.6 endif::[] ifeval::[{project_community}==true] 2.9 endif::[] true org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory ifeval::[{project_product}==true] 2.6 endif::[] ifeval::[{project_community}==true] 2.9 endif::[] true org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory ifeval::[{project_product}==true] 2.6 endif::[] ifeval::[{project_community}==true] 2.9 endif::[] true org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory ifeval::[{project_product}==true] 2.6 endif::[] ifeval::[{project_community}==true] 2.9 endif::[] true org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory ifeval::[{project_product}==true] 2.6 endif::[] ifeval::[{project_community}==true] 2.9 endif::[] ``` + .. Add outbound socket binding for the remote store into `socket-binding-group` element configuration: + ```xml ``` + .. The configuration of distributed cache `authenticationSessions` and other caches is left unchanged. .. It is recommended to add the `remoteStoreSecurityEnabled` property with the value of `false` (or eventually `true` if you enabled security for the {jdgserver_name} servers as described above) to the `connectionsInfinispan` SPI in the `keycloak-server` subsystem: + ```xml ... ... ... ``` .. Optionally enable DEBUG logging under the `logging` subsystem: + ```xml ``` + . Copy the `NODE11` to 3 other directories referred later as `NODE12`, `NODE21` and `NODE22`. . Start `NODE11` : + [source,subs="+quotes"] ---- cd NODE11/bin ./standalone.sh -c standalone-ha.xml -Djboss.node.name=node11 -Djboss.site.name=site1 \ -Djboss.default.multicast.address=234.56.78.1 -Dremote.cache.host=server1 \ -Djava.net.preferIPv4Stack=true -b _PUBLIC_IP_ADDRESS_ ---- + . Start `NODE12` : + [source,subs="+quotes"] ---- cd NODE12/bin ./standalone.sh -c standalone-ha.xml -Djboss.node.name=node12 -Djboss.site.name=site1 \ -Djboss.default.multicast.address=234.56.78.1 -Dremote.cache.host=server1 \ -Djava.net.preferIPv4Stack=true -b _PUBLIC_IP_ADDRESS_ ---- + The cluster nodes should be connected. Something like this should be in the log of both NODE11 and NODE12: + ``` Received new cluster view for channel keycloak: [node11|1] (2) [node11, node12] ``` NOTE: The channel name in the log might be different. . Start `NODE21` : + [source,subs="+quotes"] ---- cd NODE21/bin ./standalone.sh -c standalone-ha.xml -Djboss.node.name=node21 -Djboss.site.name=site2 \ -Djboss.default.multicast.address=234.56.78.2 -Dremote.cache.host=server2 \ -Djava.net.preferIPv4Stack=true -b _PUBLIC_IP_ADDRESS_ ---- + It shouldn't be connected to the cluster with `NODE11` and `NODE12`, but to separate one: + ``` Received new cluster view for channel keycloak: [node21|0] (1) [node21] ``` + . Start `NODE22` : + [source,subs="+quotes"] ---- cd NODE22/bin ./standalone.sh -c standalone-ha.xml -Djboss.node.name=node22 -Djboss.site.name=site2 \ -Djboss.default.multicast.address=234.56.78.2 -Dremote.cache.host=server2 \ -Djava.net.preferIPv4Stack=true -b _PUBLIC_IP_ADDRESS_ ---- + It should be in cluster with `NODE21` : + ``` Received new cluster view for channel keycloak: [node21|1] (2) [node21, node22] ``` + NOTE: The channel name in the log might be different. . Test: .. Go to `http://node11:8080/auth/` and create the initial admin user. .. Go to `http://node11:8080/auth/admin` and login as admin to admin console. .. Open a second browser and go to any of nodes `http://node12:8080/auth/admin` or `http://node21:8080/auth/admin` or `http://node22:8080/auth/admin`. After login, you should be able to see the same sessions in tab `Sessions` of particular user, client or realm on all 4 servers. .. After doing any change in Keycloak admin console (eg. update some user or some realm), the update should be immediately visible on any of 4 nodes as caches should be properly invalidated everywhere. .. Check server.logs if needed. After login or logout, the message like this should be on all the nodes `NODEXY/standalone/log/server.log` : + ``` 2017-08-25 17:35:17,737 DEBUG [org.keycloak.models.sessions.infinispan.remotestore.RemoteCacheSessionListener] (Client-Listener-sessions-30012a77422542f5) Received event from remote store. Event 'CLIENT_CACHE_ENTRY_REMOVED', key '193489e7-e2bc-4069-afe8-f1dfa73084ea', skip 'false' ``` [[administration]] ==== Administration of cross-site deployment This section contains some tips and options related to cross-site replication. * When you run the {project_name} server inside a data center, it is required that the database referenced in `KeycloakDS` datasource is already running and available in that data center. It is also necessary that the {jdgserver_name} server referenced by the `outbound-socket-binding`, which is referenced from the Infinispan cache `remote-store` element, is already running. Otherwise the {project_name} server will fail to start. * Every data center can have more database nodes if you want to support database failover and better reliability. Refer to the documentation of your database and JDBC driver for the details how to set this up on the database side and how the `KeycloakDS` datasource on Keycloak side needs to be configured. * Every datacenter can have more {jdgserver_name} servers running in the cluster. This is useful if you want some failover and better fault tolerance. The Hot Rod protocol used for communication between {jdgserver_name} servers and {project_name} servers has a feature that {jdgserver_name} servers will automatically send new topology to the {project_name} servers about the change in the {jdgserver_name} cluster, so the remote store on {project_name} side will know to which {jdgserver_name} servers it can connect. Read the {jdgserver_name} and WildFly documentation for more details. * It is highly recommended that a master {jdgserver_name} server is running in every site before the {project_name} servers in **any** site are started. As in our example, we started both `server1` and `server2` first, before all {project_name} servers. If you still need to run the {project_name} server and the backup site is offline, it is recommended to manually switch the backup site offline on the {jdgserver_name} servers on your site, as described in <>. If you do not manually switch the unavailable site offline, the first startup may fail or they may be some exceptions during startup until the backup site is taken offline automatically due the configured count of failed operations. [[onoffline]] ==== Bringing sites offline and online For example, assume this scenario: . Site `site2` is entirely offline from the `site1` perspective. This means that all {jdgserver_name} servers on `site2` are off *or* the network between `site1` and `site2` is broken. . You run {project_name} servers and {jdgserver_name} server `server1` in site `site1` . Someone logs in on a {project_name} server on `site1`. . The {project_name} server from `site1` will try to write the session to the remote cache on `server1` server, which is supposed to backup data to the `server2` server in the `site2`. See <> for more information. . Server `server2` is offline or unreachable from `server1`. So the backup from `server1` to `server2` will fail. . The exception is thrown in `server1` log and the failure will be propagated from `server1` server to {project_name} servers as well because the default `FAIL` backup failure policy is configured. See <> for details around the backup policies. . The error will happen on {project_name} side too and user may not be able to finish his login. According to your environment, it may be more or less probable that the network between sites is unavailable or temporarily broken (split-brain). In case this happens, it is good that {jdgserver_name} servers on `site1` are aware of the fact that {jdgserver_name} servers on `site2` are unavailable, so they will stop trying to reach the servers in the `server2` site and the backup failures won't happen. This is called `Take site offline` . .Take site offline There are 2 ways to take the site offline. **Manually by admin** - Admin can use the `jconsole` or other tool and run some JMX operations to manually take the particular site offline. This is useful especially if the outage is planned. With `jconsole` or CLI, you can connect to the `server1` server and take the `site2` offline. More details about this are available in the ifeval::[{project_product}==true] link:{jdgserver_crossdcdocs_link}[{jdgserver_name} documentation]. endif::[] ifeval::[{project_community}==true] link:{jdgserver_crossdcdocs_link}[{jdgserver_name} documentation]. endif::[] WARNING: These steps usually need to be done for all the {project_name} caches mentioned in <>. **Automatically** - After some amount of failed backups, the `site2` will usually be taken offline automatically. This is done due the configuration of `take-offline` element inside the cache configuration as configured in <>. ```xml ``` This example shows that the site will be taken offline automatically for the particular single cache if there are at least 3 subsequent failed backups and there is no any successful backup within 60 seconds. Automatically taking a site offline is useful especially if the broken network between sites is unplanned. The disadvantage is that there will be some failed backups until the network outage is detected, which could also mean failures on the application side. For example, there will be failed logins for some users or big login timeouts. Especially if `failure-policy` with value `FAIL` is used. WARNING: The tracking of whether a site is offline is tracked separately for every cache. .Take site online Once your network is back and `site1` and `site2` can talk to each other, you may need to put the site online. This needs to be done manually through JMX or CLI in similar way as taking a site offline. Again, you may need to check all the caches and bring them online. Once the sites are put online, it's usually good to: * Do the <>. * Manually <>. [[statetransfer]] ==== State transfer State transfer is a required, manual step. {jdgserver_name} server does not do this automatically, for example during split-brain, it is only the admin who may decide which site has preference and hence if state transfer needs to be done bidirectionally between both sites or just unidirectionally, as in only from `site1` to `site2`, but not from `site2` to `site1`. A bidirectional state transfer will ensure that entities which were created *after* split-brain on `site1` will be transferred to `site2`. This is not an issue as they do not yet exist on `site2`. Similarly, entities created *after* split-brain on `site2` will be transferred to `site1`. Possibly problematic parts are those entities which exist *before* split-brain on both sites and which were updated during split-brain on both sites. When this happens, one of the sites will *win* and will overwrite the updates done during split-brain by the second site. Unfortunately, there is no any universal solution to this. Split-brains and network outages are just state, which is usually impossible to be handled 100% correctly with 100% consistent data between sites. In the case of {project_name}, it typically is not a critical issue. In the worst case, users will need to re-login again to their clients, or have the improper count of loginFailures tracked for brute force protection. See the {jdgserver_name}/JGroups documentation for more tips how to deal with split-brain. The state transfer can be also done on the {jdgserver_name} server side through JMX. The operation name is `pushState`. There are few other operations to monitor status, cancel push state, and so on. More info about state transfer is available in the link:{jdgserver_crossdcdocs_link}[{jdgserver_name} docs]. [[clearcache]] ==== Clear caches After split-brain it is safe to manually clear caches in the {project_name} admin console. This is because there might be some data changed in the database on `site1` and because of the event, that the cache should be invalidated wasn't transferred during split-brain to `site2`. Hence {project_name} nodes on `site2` may still have some stale data in their caches. To clear the caches, see {adminguide_clearcache_link}[{adminguide_clearcache_name}]. When the network is back, it is sufficient to clear the cache just on one {project_name} node on any random site. The cache invalidation event will be sent to all the other {project_name} nodes in all sites. However, it needs to be done for all the caches (realms, users, keys). See link:{adminguide_clearcache_link}[{adminguide_clearcache_name}] for more information. [[tuningcache]] ==== Tuning the {jdgserver_name} cache configuration This section contains tips and options for configuring your JDG cache. [[backupfailure]] .Backup failure policy By default, the configuration of backup `failure-policy` in the Infinispan cache configuration in the {jdgserver_name} `clustered.xml` file is configured as `FAIL`. You may change it to `WARN` or `IGNORE`, as you prefer. The difference between `FAIL` and `WARN` is that when `FAIL` is used and the {jdgserver_name} server tries to back data up to the other site and the backup fails then the failure will be propagated back to the caller (the {project_name} server). The backup might fail because the second site is temporarily unreachable or there is a concurrent transaction which is trying to update same entity. In this case, the {project_name} server will then retry the operation a few times. However, if the retry fails, then the user might see the error after a longer timeout. When using `WARN`, the failed backups are not propagated from the {jdgserver_name} server to the {project_name} server. The user won't see the error and the failed backup will be just ignored. There will be a shorter timeout, typically 10 seconds as that's the default timeout for backup. It can be changed by the attribute `timeout` of `backup` element. There won't be retries. There will just be a WARNING message in the {jdgserver_name} server log. The potential issue is, that in some cases, there may be just some a short network outage between sites, where the retry (usage of the `FAIL` policy) may help, so with `WARN` (without retry), there will be some data inconsistencies across sites. This can also happen if there is an attempt to update the same entity concurrently on both sites. How bad are these inconsistencies? Usually only means that a user will need to re-authenticate. When using the `WARN` policy, it may happen that the single-use cache, which is provided by the `actionTokens` cache and which handles that particular key is really single use, but may "successfully" write the same key twice. But, for example, the OAuth2 specification link:https://datatracker.ietf.org/doc/html/rfc6749#section-10.5[mentions] that code must be single-use. With the `WARN` policy, this may not be strictly guaranteed and the same code could be written twice if there is an attempt to write it concurrently in both sites. If there is a longer network outage or split-brain, then with both `FAIL` and `WARN`, the other site will be taken offline after some time and failures as described in <>. With the default 1 minute timeout, it is usually 1-3 minutes until all the involved caches are taken offline. After that, all the operations will work fine from an end user perspective. You only need to manually restore the site when it is back online as mentioned in <>. In summary, if you expect frequent, longer outages between sites and it is acceptable for you to have some data inconsistencies and a not 100% accurate single-use cache, but you never want end-users to see the errors and long timeouts, then switch to `WARN`. The difference between `WARN` and `IGNORE` is, that with `IGNORE` warnings are not written in the {jdgserver_name} log. See more details in the Infinispan documentation. .Lock acquisition timeout The default configuration is using transaction in NON_DURABLE_XA mode with acquire timeout 0. This means that transaction will fail-fast if there is another transaction in progress for the same key. The reason to switch this to 0 instead of default 10 seconds was to avoid possible deadlock issues. With {project_name}, it can happen that the same entity (typically session entity or loginFailure) is updated concurrently from both sites. This can cause deadlock under some circumstances, which will cause the transaction to be blocked for 10 seconds. See link:https://issues.redhat.com/browse/JDG-1318[this JIRA report] for details. With timeout 0, the transaction will immediately fail and then will be retried from {project_name} if backup `failure-policy` with the value `FAIL` is configured. As long as the second concurrent transaction is finished, the retry will usually be successful and the entity will have applied updates from both concurrent transactions. We see very good consistency and results for concurrent transaction with this configuration, and it is recommended to keep it. The only (non-functional) problem is the exception in the {jdgserver_name} server log, which happens every time when the lock is not immediately available. [[backups]] ==== SYNC or ASYNC backups An important part of the `backup` element is the `strategy` attribute. You must decide whether it needs to be `SYNC` or `ASYNC`. We have 7 caches which might be cross-site replication aware, and these can be configured in 3 different modes regarding cross-site: . SYNC backup . ASYNC backup . No backup at all If the `SYNC` backup is used, then the backup is synchronous and operation is considered finished on the caller ({project_name} server) side once the backup is processed on the second site. This has worse performance than `ASYNC`, but on the other hand, you are sure that subsequent reads of the particular entity, such as user session, on `site2` will see the updates from `site1`. Also, it is needed if you want data consistency. As with `ASYNC` the caller is not notified at all if backup to the other site failed. For some caches, it is even possible to not backup at all and completely skip writing data to the {jdgserver_name} server. To set this up, do not use the `remote-store` element for the particular cache on the {project_name} side (file `KEYCLOAK_HOME/standalone/configuration/standalone-ha.xml`) and then the particular `replicated-cache` element is also not needed on the {jdgserver_name} server side. By default, all 7 caches are configured with `SYNC` backup, which is the safest option. Here are a few things to consider: * If you are using active/passive mode (all {project_name} servers are in single site `site1` and the {jdgserver_name} server in `site2` is used purely as backup. See <> for more details), then it is usually fine to use `ASYNC` strategy for all the caches to save the performance. * The `work` cache is used mainly to send some messages, such as cache invalidation events, to the other site. It is also used to ensure that some special events, such as userStorage synchronizations, happen only on single site. It is recommended to keep this set to `SYNC`. * The `actionTokens` cache is used as single-use cache to track that some tokens/tickets were used just once. For example action tokens or OAuth2 codes. It is possible to set this to `ASYNC` to slightly improved performance, but then it is not guaranteed that particular ticket is really single-use. For example, if there is concurrent request for same ticket in both sites, then it is possible that both requests will be successful with the `ASYNC` strategy. So what you set here will depend on whether you prefer better security (`SYNC` strategy) or better performance (`ASYNC` strategy). * The `loginFailures` cache may be used in any of the 3 modes. If there is no backup at all, it means that count of login failures for a user will be counted separately for every site (See <> for details). This has some security implications, however it has some performance advantages. Also it mitigates the possible risk of denial of service (DoS) attacks. For example, if an attacker simulates 1000 concurrent requests using the username and password of the user on both sites, it will mean lots of messages being passed between the sites, which may result in network congestion. The `ASYNC` strategy might be even worse as the attacker requests won't be blocked by waiting for the backup to the other site, resulting in potentially even more congested network traffic. The count of login failures also will not be accurate with the `ASYNC` strategy. For the environments with slower network between data centers and probability of DoS, it is recommended to not backup the `loginFailures` cache at all. * It is recommended to keep the `sessions` and `clientSessions` caches in `SYNC`. Switching them to `ASYNC` is possible only if you are sure that user requests and backchannel requests (requests from client applications to {project_name} as described in <>) will be always processed on same site. This is true, for example, if: ** You use active/passive mode as described <>. ** All your client applications are using the {project_name} {adapterguide_link_js_adapter}[JavaScript Adapter]. The JavaScript adapter sends the backchannel requests within the browser and hence they participate on the browser sticky session and will end on same cluster node (hence on same site) as the other browser requests of this user. ** Your load balancer is able to serve the requests based on client IP address (location) and the client applications are deployed on both sites. + For example you have 2 sites LON and NYC. As long as your applications are deployed in both LON and NYC sites too, you can ensure that all the user requests from London users will be redirected to the applications in LON site and also to the {project_name} servers in LON site. Backchannel requests from the LON site client deployments will end on {project_name} servers in LON site too. On the other hand, for the American users, all the {project_name} requests, application requests and backchannel requests will be processed on NYC site. + * For `offlineSessions` and `offlineClientSessions` it is similar, with the difference that you even don't need to backup them at all if you never plan to use offline tokens for any of your client applications. Generally, if you are in doubt and performance is not a blocker for you, it's safer to keep the caches in `SYNC` strategy. WARNING: Regarding the switch to SYNC/ASYNC backup, make sure that you edit the `strategy` attribute of the `backup` element. For example like this: ```xml ``` Note the `mode` attribute of cache-configuration element. [[troubleshooting]] ==== Troubleshooting The following tips are intended to assist you should you need to troubleshoot: * It is recommended to go through the <> and have this one working first, so that you have some understanding of how things work. It is also wise to read this entire document to have some understanding of things. * Check in jconsole cluster status (GMS) and the JGroups status (RELAY) of {jdgserver_name} as described in <>. If things do not look as expected, then the issue is likely in the setup of {jdgserver_name} servers. * For the {project_name} servers, you should see a message like this during the server startup: + ``` 18:09:30,156 INFO [org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory] (ServerService Thread Pool -- 54) Node name: node11, Site name: site1 ``` + Check that the site name and the node name looks as expected during the startup of {project_name} server. * Check that {project_name} servers are in cluster as expected, including that only the {project_name} servers from the same data center are in cluster with each other. This can be also checked in JConsole through the GMS view. See link:{installguide_troubleshooting_link}[cluster troubleshooting] for additional details. * If there are exceptions during startup of {project_name} server like this: + ``` 17:33:58,605 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation] (ServerService Thread Pool -- 59) ISPN004007: Exception encountered. Retry 10 out of 10: org.infinispan.client.hotrod.exceptions.TransportException:: Could not fetch transport ... Caused by: org.infinispan.client.hotrod.exceptions.TransportException:: Could not connect to server: 127.0.0.1:12232 at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.(TcpTransport.java:82) ``` + it usually means that {project_name} server is not able to reach the {jdgserver_name} server in his own datacenter. Make sure that firewall is set as expected and {jdgserver_name} server is possible to connect. * If there are exceptions during startup of {project_name} server like this: + ``` 16:44:18,321 WARN [org.infinispan.client.hotrod.impl.protocol.Codec21] (ServerService Thread Pool -- 57) ISPN004005: Error received from the server: javax.transaction.RollbackException: ARJUNA016053: Could not commit transaction. ... ``` + then check the log of corresponding {jdgserver_name} server of your site and check if has failed to backup to the other site. If the backup site is unavailable, then it is recommended to switch it offline, so that {jdgserver_name} server won't try to backup to the offline site causing the operations to pass successfully on {project_name} server side as well. See <> for more information. * Check the Infinispan statistics, which are available through JMX. For example, try to login and then see if the new session was successfully written to both {jdgserver_name} servers and is available in the `sessions` cache there. This can be done indirectly by checking the count of elements in the `sessions` cache for the MBean `jboss.datagrid-infinispan:type=Cache,name="sessions(repl_sync)",manager="clustered",component=Statistics` and attribute `numberOfEntries`. After login, there should be one more entry for `numberOfEntries` on both {jdgserver_name} servers on both sites. * Enable DEBUG logging as described <>. For example, if you log in and you think that the new session is not available on the second site, it's good to check the {project_name} server logs and check that listeners were triggered as described in the <>. If you do not know and want to ask on keycloak-user mailing list, it is helpful to send the log files from {project_name} servers on both datacenters in the email. Either add the log snippets to the mails or put the logs somewhere and reference them in the email. * If you updated the entity, such as `user`, on {project_name} server on `site1` and you do not see that entity updated on the {project_name} server on `site2`, then the issue can be either in the replication of the synchronous database itself or that {project_name} caches are not properly invalidated. You may try to temporarily disable the {project_name} caches as described link:{installguide_disablingcaching_link}[here] to nail down if the issue is at the database replication level. Also it may help to manually connect to the database and check if data are updated as expected. This is specific to every database, so you will need to consult the documentation for your database. * Sometimes you may see the exceptions related to locks like this in {jdgserver_name} server log: + ``` (HotRodServerHandler-6-35) ISPN000136: Error executing command ReplaceCommand, writing keys [[B0x033E243034396234..[39]]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key [B0x033E243034396234..[39] and requestor GlobalTx:server1:4353. Lock is held by GlobalTx:server1:4352 ``` + Those exceptions are not necessarily an issue. They may happen anytime when a concurrent edit of the same entity is triggered on both DCs. This is common in a deployment. Usually the {project_name} server is notified about the failed operation and will retry it, so from the user's point of view, there is usually not any issue. * If there are exceptions during startup of {project_name} server, like this: + ``` 16:44:18,321 WARN [org.infinispan.client.hotrod.impl.protocol.Codec21] (ServerService Thread Pool -- 55) ISPN004005: Error received from the server: java.lang.SecurityException: ISPN000287: Unauthorized access: subject 'Subject with principal(s): []' lacks 'READ' permission ... ``` + These log entries are the result of {project_name} automatically detecting whether authentication is required on {jdgserver_name} and mean that authentication is necessary. At this point you will notice that either the server starts successfully and you can safely ignore these or that the server fails to start. If the server fails to start, ensure that {jdgserver_name} has been configured properly for authentication as described in <>. To prevent this log entry from being included, you can force authentication by setting `remoteStoreSecurityEnabled` property to `true` in `spi=connectionsInfinispan/provider=default` configuration: + ```xml ... ... ... ``` * If you try to authenticate with {project_name} to your application, but authentication fails with an infinite number of redirects in your browser and you see the errors like this in the {project_name} server log: + ``` 2017-11-27 14:50:31,587 WARN [org.keycloak.events] (default task-17) type=LOGIN_ERROR, realmId=master, clientId=null, userId=null, ipAddress=aa.bb.cc.dd, error=expired_code, restart_after_timeout=true ``` + it probably means that your load balancer needs to be set to support sticky sessions. Make sure that the provided route name used during startup of {project_name} server (Property `jboss.node.name`) contains the correct name used by the load balancer server to identify the current server. * If the {jdgserver_name} `work` cache grows indefinitely, you may be experiencing https://issues.redhat.com/browse/JDG-987[this {jdgserver_name} issue], which is caused by cache items not being properly expired. In that case, update the cache declaration with an empty `` tag like this: + ```xml ``` * If you see Warnings in the {jdgserver_name} server log like: + ``` 18:06:19,687 WARN [org.infinispan.server.hotrod.Decoder2x] (HotRod-ServerWorker-7-12) ISPN006011: Operation 'PUT_IF_ABSENT' forced to return previous value should be used on transactional caches, otherwise data inconsistency issues could arise under failure situations 18:06:19,700 WARN [org.infinispan.server.hotrod.Decoder2x] (HotRod-ServerWorker-7-10) ISPN006010: Conditional operation 'REPLACE_IF_UNMODIFIED' should be used with transactional caches, otherwise data inconsistency issues could arise under failure situations ``` + you can just ignore them. To avoid the warning, the caches on {jdgserver_name} server side could be changed to transactional caches, but this is not recommended as it can cause some other issues caused by the bug https://issues.redhat.com/browse/ISPN-9323. So for now, the warnings just need to be ignored. * If you see errors in the {jdgserver_name} server log like: + ``` 12:08:32,921 ERROR [org.infinispan.server.hotrod.CacheDecodeContext] (HotRod-ServerWorker-7-11) ISPN005003: Exception reported: org.infinispan.server.hotrod.InvalidMagicIdException: Error reading magic byte or message id: 7 at org.infinispan.server.hotrod.HotRodDecoder.readHeader(HotRodDecoder.java:184) at org.infinispan.server.hotrod.HotRodDecoder.decodeHeader(HotRodDecoder.java:133) at org.infinispan.server.hotrod.HotRodDecoder.decode(HotRodDecoder.java:92) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) ``` + and you see some similar errors in the {project_name} log, it can indicate that there are incompatible versions of the Hot Rod protocol being used. This is likely happen when you try to use {project_name} with an old version of the Infinispan server. It will help if you add the `protocolVersion` property as an additional property to the `remote-store` element in the {project_name} configuration file. For example: + ```xml 2.6 ```