keycloak-scim/topics/database/unicode-considerations.adoc

64 lines
3.8 KiB
Text

=== Unicode Considerations for Databases
Database schema in {{book.project.name}} only accounts for Unicode strings in the following special fields:
* Realms: display name, HTML display name
* Federation Providers: display name
* Users: username, given name, last name, attribute names and values
* Groups: name, attribute names and values
* Roles: name
* Descriptions of objects
Otherwise, characters are limited to those contained in database encoding which is often 8-bit. However, for some
database systems, it is possible to enable UTF-8 encoding of Unicode characters and use full Unicode character set in all
text fields. Often, this is counterbalanced by shorter maximum length of the strings than in case of 8-bit encodings.
Some of the databases require special settings to database and/or JDBC driver to be able to handle Unicode characters.
Please find the settings for your database below. Note that if a database is listed here, it can still work properly
provided it handles UTF-8 encoding properly both on the level of database and JDBC driver.
Technically, the key criterion for Unicode support for all fields is whether the database allows setting of Unicode
character set for `VARCHAR` and `CHAR` fields. If yes, there is a high chance that Unicode will be plausible, usually at
the expense of field length. If it only supports Unicode in `NVARCHAR` and `NCHAR` fields, Unicode support for all text
fields is unlikely as Keycloak schema uses `VARCHAR` and `CHAR` fields extensively.
==== Oracle Database
Unicode characters are properly handled provided the database was created with Unicode support in `VARCHAR` and `CHAR`
fields (e.g. by using `AL32UTF8` character set as the database character set). No special settings is needed for JDBC
driver.
If the database character set is not Unicode, then to use Unicode characters in the special fields, the JDBC driver needs
to be configured with the connection property `oracle.jdbc.defaultNChar` set to `true`. It might be wise, though not
strictly necessary, to also set the `oracle.jdbc.convertNcharLiterals` connection property to `true`. These properties
can be set either as system properties or as connection properties. Please note that setting `oracle.jdbc.defaultNChar`
may have negative impact on performance. For details, please refer to Oracle JDBC driver configuration documentation.
==== Microsoft SQL Server Database
Unicode characters are properly handled only for the special fields. No special settings of JDBC driver or database is
necessary.
==== IBM DB2 Database
Unicode characters are properly handled for all fields, length reduction applies to non-special fields. No special
settings of JDBC driver or database is necessary.
==== MySQL Database
Unicode characters are properly handled provided the database was created with Unicode support in `VARCHAR` and `CHAR`
fields in the `CREATE DATABASE` command (e.g. by using `utf8` character set as the default database character set in
MySQL 5.5. Please note that `utf8mb4` character set does not work due to different storage requirements to `utf8`
character set footnote:[Tracked as https://issues.jboss.org/browse/KEYCLOAK-3873]). Note that in this case, length
restriction to non-special fields does not apply because columns are created to accomodate given amount of characters,
not bytes. If the database default character set does not allow storing Unicode, only the special fields allow storing
Unicode values.
At the side of JDBC driver settings, it is necessary to add a connection property `characterEncoding=UTF-8` to the JDBC
connection settings.
==== PostgreSQL Database
Unicode is supported when the database character set is `UTF8`. In that case, Unicode characters can be used in any
field, there is no reduction of field length for non-special fields. No special settings of JDBC driver is necessary.