Merge pull request #20 from hmlnarik/KEYCLOAK-3439-database-encoding

KEYCLOAK-3439, KEYCLOAK-3893, KEYCLOAK-3894 - Database UTF-8 settings
2016-12-06 10:38:23 -05:00 · 2016-12-06 10:38:23 -05:00 · 1466b558b4
commit 1466b558b4
parent f55868559d 567ca00fe1
2 changed files with 65 additions and 0 deletions
--- a/SUMMARY.adoc
+++ b/SUMMARY.adoc
@ -20,6 +20,7 @@
 .. link:topics/database/jdbc.adoc[JDBC Setup]
 .. link:topics/database/datasource.adoc[Datasource Setup]
 .. link:topics/database/hibernate.adoc[Database Configuration]
 .. link:topics/database/unicode-considerations.adoc[Unicode considerations]
 . link:topics/mongo.adoc[Mongo DB Setup]
 . link:topics/network.adoc[Network Setup]
 .. link:topics/network/bind-address.adoc[Bind Addresses]
--- a/topics/database/unicode-considerations.adoc
+++ b/topics/database/unicode-considerations.adoc
@ -0,0 +1,64 @@
 === Unicode Considerations for Databases
 Database schema in {{book.project.name}} only accounts for Unicode strings in the following special fields:
 * Realms: display name, HTML display name
 * Federation Providers: display name
 * Users: username, given name, last name, attribute names and values
 * Groups: name, attribute names and values
 * Roles: name
 * Descriptions of objects
 Otherwise, characters are limited to those contained in database encoding which is often 8-bit. However, for some
 database systems, it is possible to enable UTF-8 encoding of Unicode characters and use full Unicode character set in all
 text fields. Often, this is counterbalanced by shorter maximum length of the strings than in case of 8-bit encodings.
 Some of the databases require special settings to database and/or JDBC driver to be able to handle Unicode characters.
 Please find the settings for your database below. Note that if a database is listed here, it can still work properly
 provided it handles UTF-8 encoding properly both on the level of database and JDBC driver.
 Technically, the key criterion for Unicode support for all fields is whether the database allows setting of Unicode
 character set for `VARCHAR` and `CHAR` fields. If yes, there is a high chance that Unicode will be plausible, usually at
 the expense of field length. If it only supports Unicode in `NVARCHAR` and `NCHAR` fields, Unicode support for all text
 fields is unlikely as Keycloak schema uses `VARCHAR` and `CHAR` fields extensively.
 ==== Oracle Database
 Unicode characters are properly handled provided the database was created with Unicode support in `VARCHAR` and `CHAR`
 fields (e.g. by using `AL32UTF8` character set as the database character set). No special settings is needed for JDBC
 driver.
 If the database character set is not Unicode, then to use Unicode characters in the special fields, the JDBC driver needs
 to be configured with the connection property `oracle.jdbc.defaultNChar` set to `true`. It might be wise, though not
 strictly necessary, to also set the `oracle.jdbc.convertNcharLiterals` connection property to `true`. These properties
 can be set either as system properties or as connection properties. Please note that setting `oracle.jdbc.defaultNChar`
 may have negative impact on performance. For details, please refer to Oracle JDBC driver configuration documentation.
 ==== Microsoft SQL Server Database
 Unicode characters are properly handled only for the special fields. No special settings of JDBC driver or database is
 necessary.
 ==== IBM DB2 Database
 Unicode characters are properly handled for all fields, length reduction applies to non-special fields. No special
 settings of JDBC driver or database is necessary.
 ==== MySQL Database
 Unicode characters are properly handled provided the database was created with Unicode support in `VARCHAR` and `CHAR`
 fields in the `CREATE DATABASE` command (e.g. by using `utf8` character set as the default database character set in
 MySQL 5.5. Please note that `utf8mb4` character set does not work due to different storage requirements to `utf8`
 character set footnote:[Tracked as https://issues.jboss.org/browse/KEYCLOAK-3873]). Note that in this case, length
 restriction to non-special fields does not apply because columns are created to accomodate given amount of characters,
 not bytes. If the database default character set does not allow storing Unicode, only the special fields allow storing
 Unicode values.
 At the side of JDBC driver settings, it is necessary to add a connection property `characterEncoding=UTF-8` to the JDBC
 connection settings.
 ==== PostgreSQL Database
 Unicode is supported when the database character set is `UTF8`. In that case, Unicode characters can be used in any
 field, there is no reduction of field length for non-special fields. No special settings of JDBC driver is necessary.