Merge pull request #20 from hmlnarik/KEYCLOAK-3439-database-encoding

KEYCLOAK-3439, KEYCLOAK-3893, KEYCLOAK-3894 - Database UTF-8 settings
2016-12-06 10:38:23 -05:00 · 2016-12-06 10:38:23 -05:00 · 1466b558b4
commit 1466b558b4
parent f55868559d 567ca00fe1
2 changed files with 65 additions and 0 deletions
--- a/SUMMARY.adoc
+++ b/SUMMARY.adoc
@ -20,6 +20,7 @@
 .. link:topics/database/jdbc.adoc[JDBC Setup]
 .. link:topics/database/datasource.adoc[Datasource Setup]
 .. link:topics/database/hibernate.adoc[Database Configuration]
+.. link:topics/database/unicode-considerations.adoc[Unicode considerations]
 . link:topics/mongo.adoc[Mongo DB Setup]
 . link:topics/network.adoc[Network Setup]
 .. link:topics/network/bind-address.adoc[Bind Addresses]
--- a/topics/database/unicode-considerations.adoc
+++ b/topics/database/unicode-considerations.adoc
@ -0,0 +1,64 @@
+
+=== Unicode Considerations for Databases
+
+Database schema in {{book.project.name}} only accounts for Unicode strings in the following special fields:
+
+* Realms: display name, HTML display name
+* Federation Providers: display name
+* Users: username, given name, last name, attribute names and values
+* Groups: name, attribute names and values
+* Roles: name
+* Descriptions of objects
+
+Otherwise, characters are limited to those contained in database encoding which is often 8-bit. However, for some
+database systems, it is possible to enable UTF-8 encoding of Unicode characters and use full Unicode character set in all
+text fields. Often, this is counterbalanced by shorter maximum length of the strings than in case of 8-bit encodings.
+
+Some of the databases require special settings to database and/or JDBC driver to be able to handle Unicode characters.
+Please find the settings for your database below. Note that if a database is listed here, it can still work properly
+provided it handles UTF-8 encoding properly both on the level of database and JDBC driver.
+
+Technically, the key criterion for Unicode support for all fields is whether the database allows setting of Unicode
+character set for `VARCHAR` and `CHAR` fields. If yes, there is a high chance that Unicode will be plausible, usually at
+the expense of field length. If it only supports Unicode in `NVARCHAR` and `NCHAR` fields, Unicode support for all text
+fields is unlikely as Keycloak schema uses `VARCHAR` and `CHAR` fields extensively.
+
+==== Oracle Database
+
+Unicode characters are properly handled provided the database was created with Unicode support in `VARCHAR` and `CHAR`
+fields (e.g. by using `AL32UTF8` character set as the database character set). No special settings is needed for JDBC
+driver.
+
+If the database character set is not Unicode, then to use Unicode characters in the special fields, the JDBC driver needs
+to be configured with the connection property `oracle.jdbc.defaultNChar` set to `true`. It might be wise, though not
+strictly necessary, to also set the `oracle.jdbc.convertNcharLiterals` connection property to `true`. These properties
+can be set either as system properties or as connection properties. Please note that setting `oracle.jdbc.defaultNChar`
+may have negative impact on performance. For details, please refer to Oracle JDBC driver configuration documentation.
+
+==== Microsoft SQL Server Database
+
+Unicode characters are properly handled only for the special fields. No special settings of JDBC driver or database is
+necessary.
+
+==== IBM DB2 Database
+
+Unicode characters are properly handled for all fields, length reduction applies to non-special fields. No special
+settings of JDBC driver or database is necessary.
+
+==== MySQL Database
+
+Unicode characters are properly handled provided the database was created with Unicode support in `VARCHAR` and `CHAR`
+fields in the `CREATE DATABASE` command (e.g. by using `utf8` character set as the default database character set in
+MySQL 5.5. Please note that `utf8mb4` character set does not work due to different storage requirements to `utf8`
+character set footnote:[Tracked as https://issues.jboss.org/browse/KEYCLOAK-3873]). Note that in this case, length
+restriction to non-special fields does not apply because columns are created to accomodate given amount of characters,
+not bytes. If the database default character set does not allow storing Unicode, only the special fields allow storing
+Unicode values.
+
+At the side of JDBC driver settings, it is necessary to add a connection property `characterEncoding=UTF-8` to the JDBC
+connection settings.
+
+==== PostgreSQL Database
+
+Unicode is supported when the database character set is `UTF8`. In that case, Unicode characters can be used in any
+field, there is no reduction of field length for non-special fields. No special settings of JDBC driver is necessary.