# Keycloak Datasets ## Provision Keycloak Server Before generating data it is necessary to provision/start Keycloak server. This can be done automatically by running: ``` cd testsuite/performance mvn clean install mvn verify -P provision ``` To tear down the system after testing run: ``` mvn verify -P teardown ``` The teardown step will delete the database as well so it is possible to use it between generating different datasets. It is also possible to start the server externally (manually). In that case it is necessary to provide information in file `tests/target/provisioned-system.properties`. See the main README for details. ## Generate Data To generate the *default dataset* run: ``` cd testsuite/performance mvn verify -P generate-data ``` To generate a *specific dataset* from within the project run: ``` mvn verify -P generate-data -Ddataset= ``` This will load dataset properties from `tests/src/test/resources/dataset/${dataset}.properties`. To generate a specific dataset from a *custom properties file* run: ``` mvn verify -P generate-data -Ddataset.properties.file= ``` To delete a dataset run: ``` mvn verify -P generate-data -Ddataset=… -Ddelete=true ``` This will delete all realms specified by the dataset. ## Indexed Model The model is hierarchical with the parent-child relationships determined by primary foreign keys of entities. Size of the dataset is determined by specifying a "count per parent" parameter for each entity. Number of mappings between entities created by the primary "count per parent" parameters can be speicied by "count per other entity" parameters. Each nested entity has a unique index which identifies it inside its parent entity. For example: - Realm X --> Client Y --> Client Role Z - Realm X --> Client Y --> Resource Server --> Resource Z - Realm X --> User Y - etc. Hash code of each entity is computed based on its index coordinates within the model and its class name. Each entity holds entity representation, and a list of mappings to other entities in the indexed model. The attributes and mappings are initialized by a related *entity template* class. Each entity class also acts as a wrapper around a Keycloak Admin Client using it to provide CRUD operations for its entity. The `id` attribute in the entity representation is set upon entity creation, or in case an already initialized entity was removed from LRU cache it is reloaded from the server. This may happen if the number of entities is larger than entity cache size. (see below) ### Attribute Templating Attributes of each supported entity representation can be set via FreeMarker templates. The process is based on templates defined in a properties configuration file. The first template in the list can use the `index` of the entity and any attributes of its parent entity. Each subsequent attribute template can use any previously set attribute values. Note: Output of FreeMarker engine is always a String. Transition to the actual type of the attribute is done with the Jackson 2.9+ parser using `ObjectMapper.update()` method which allows a gradual updates of an existing Java object. ### Randomness Randomness in the indexed model is deterministic (pseudorandom) because the random seeds are based on deterministic hash codes. There are 2 types of seeds: one is for using randoms in the FreeMarker templates via methods `indexBasedRandomInt(int bound)` and `indexBasedRandomBool(int percentage)`. It is based on class of the current entity + hash code of its parent entity. The other seed is for generating mappings to other entities which are just random sequences of integer indexes. This is based on hash code of the current entity. ### Generator Settings #### Timeouts - `queue.timeout`: How long to wait for an entity to be processed by a thread-pool executor. Default is `60` seconds. You might want to increase this setting when deleting many realms with many nested entities using a low number of workers. - `shutdown.timeout`: How long to wait for the executor thread-pool to shut down. Default is `60` seconds. #### Caching and Memory - `template.cache.size`: Size of cache of FreeMarker template models. Default is `10000`. - `randoms.cache.size`: Size of cache of random integer sequences which are used for mappings between entities. Default is `10000`. - `entity.cache.size`: Size of cache of initialized entities. Default is `100000`. - `max.heap`: Max heap size of the data generator JVM. ## Notes: - Mappings are random so it can sometimes happen that the same mappings are generated multiple times. Only distinct mappings are created. This means for example that if you specify `realmRolesPerUser=5` it can happen that only 4 or less roles will be actually mapped. There is an option to use unique random sequences but is is disabled right now because checking for uniqueness is CPU-intensive. - Mapping of client roles to a user right now is determined by a single parameter: `clientRolesPerUser`. Actually created mappings -- each of which contains specific client + a set of its roles -- is created based on the list of randomly selected client roles of all clients in the realm. This means the count of the actual client mappings isn't predictable. That would require specifying 2 parameters: `clientsPerUser` and `clientRolesPerClientPerUser` which would say how many clients a user has roles assigned from, and the number of roles per each of these clients. - Number of resource servers depends on how the attribute `authorizationServicesEnabled` is set for each client. This means the number isn't specified by any "perRealm" parameter. If this is needed it can be implemented via a random mapping from a resource server entity to a set of existing clients in a similar fashion to how a resource is selected for each resource permission. - The "resource type" attribute for each resource and resource-based permission defaults to the default type of the parent resource server. If it's needed a separate abstract/non-persistable entity ResourceType can be created in the model to represent a set of resource types. The "resource type" attributes can then be set based on random mappings into this set. - Generating large number of users can take a long time with the default realm settings which have the password hashing iterations set to a default value of 27500. If you wish to speed this process up decrease the value of `hashIterations()` in attribute `realm.passwordPolicy`. Note that this will also significantly affect the performance results of the tests because password hashing takes a major part of the server's compute resources. The results may improve even by a factor of 10 or higher when the hashing is set to the minimum value of 1 itreration. However it's on the expense of security.