Add docs for the OpenTelemetry tracing

Closes #31908

Signed-off-by: Martin Bartoš <mabartos@redhat.com>
Co-authored-by: Alexander Schwartz <aschwart@redhat.com>
Co-authored-by: Steven Hawkins <shawkins@redhat.com>
Co-authored-by: Václav Muzikář <vaclav@muzikari.cz>
This commit is contained in:
Martin Bartoš 2024-08-13 07:46:48 +01:00 committed by GitHub
parent a84f3937b9
commit d17a48f8f8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 130 additions and 2 deletions

View file

@ -111,3 +111,11 @@ Now {project_name} allows configuring ECDH-ES, ECDH-ES+A128KW, ECDH-ES+A192KW or
ifeval::[{project_community}==true] ifeval::[{project_community}==true]
Many thanks to https://github.com/justin-tay[Justin Tay] for the contribution. Many thanks to https://github.com/justin-tay[Justin Tay] for the contribution.
endif::[] endif::[]
= OpenTelemetry Tracing support _(Preview)_
The underlying Quarkus support for OpenTelemetry Tracing has been exposed to {project_name} and allows obtaining application traces for better observability.
It helps to find performance bottlenecks, determine the cause of application failures, trace a request through the distributed system, and much more.
The support is in preview mode, and we would be happy to obtain any feedback.
For more information, see the link:{tracingguide_link}[{tracingguide_name}] guide.

View file

@ -36,5 +36,6 @@ https://account.live.com/developers/applications/create
https://developer.twitter.com/apps/ https://developer.twitter.com/apps/
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#rolling-update https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#rolling-update
https://stackapps.com/apps/oauth/register https://stackapps.com/apps/oauth/register
# Remove the following line once KC26 is released # Remove following lines once KC26 is released
https://www.keycloak.org/server/bootstrap-admin-recovery https://www.keycloak.org/server/bootstrap-admin-recovery
https://www.keycloak.org/server/tracing

View file

@ -71,6 +71,8 @@
:gettingstarted_link_latest: https://www.keycloak.org/guides#getting-started :gettingstarted_link_latest: https://www.keycloak.org/guides#getting-started
:highavailabilityguide_name: High Availability Guide :highavailabilityguide_name: High Availability Guide
:highavailabilityguide_link: https://www.keycloak.org/guides#high-availability :highavailabilityguide_link: https://www.keycloak.org/guides#high-availability
:tracingguide_name: Enabling Tracing
:tracingguide_link: https://www.keycloak.org/server/tracing
:upgradingguide_name: Upgrading Guide :upgradingguide_name: Upgrading Guide
:upgradingguide_name_short: Upgrading :upgradingguide_name_short: Upgrading
:upgradingguide_link: {project_doc_base_url}/upgrading/ :upgradingguide_link: {project_doc_base_url}/upgrading/

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

View file

@ -16,6 +16,7 @@ fips
management-interface management-interface
health health
configuration-metrics configuration-metrics
tracing
importExport importExport
vault vault
all-config all-config

View file

@ -0,0 +1,116 @@
<#import "/templates/guide.adoc" as tmpl>
<#import "/templates/kc.adoc" as kc>
<#import "/templates/options.adoc" as opts>
<#import "/templates/links.adoc" as links>
<#import "/templates/profile.adoc" as profile>
<@tmpl.guide title="Enabling Tracing"
preview="true"
summary="Learn how to enable distributed tracing in {project_name}"
includedOptions="tracing-* log-*-include-trace">
This {section} explains how you can enable and configure distributed tracing in {project_name} by utilizing https://opentelemetry.io/[OpenTelemetry] (OTel).
Tracing allows for detailed monitoring of each request's lifecycle, which helps quickly identify and diagnose issues, leading to more efficient debugging and maintenance.
It also provides valuable insights into performance bottlenecks and can help optimize the system's overall efficiency.
{project_name} uses a supported https://quarkus.io/guides/opentelemetry-tracing[Quarkus OTel extension] that provides smooth integration and exposure of application traces.
== Enable tracing
It is possible to enable exposing traces using the build time option `tracing-enabled` as follows:
<@kc.start parameters="--tracing-enabled=true"/>
By default, the trace exporters send out data in batches, using the `gRPC` protocol and endpoint `+http://localhost:4317+`.
For more tracing settings, see all possible configurations below.
== Development setup
In order to see the captured {project_name} traces, the basic setup with leveraging the https://www.jaegertracing.io/[Jaeger] tracing platform might be used.
For development purposes, the Jaeger-all-in-one can be used to see traces as easily as possible.
NOTE: Jaeger-all-in-one includes the Jaeger agent, an OTel collector, and the query service/UI.
You do not need to install a separate collector, as you can directly send the trace data to Jaeger.
[source, bash]
----
podman|docker run --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one
----
=== Exposed ports
* `:16686` - Jaeger UI
* `:4317` - OpenTelemetry Protocol gRPC receiver (default)
* `:4318` - OpenTelemetry Protocol HTTP receiver
You can visit the Jaeger UI on `+http://localhost:16686/+` to see the tracing information.
The Jaeger UI might look like this with an arbitrary {project_name} trace:
image::jaeger-tracing.png[Jaeger UI]
== Traces in logs
When tracing is enabled, the trace information is included in the log messages of all enabled log handlers (see more in <@links.server id="logging"/>).
It can be useful for associating log events to request execution, which might provide better traceability and debugging.
All log lines originating from the same request will have the same `traceId` in the log.
The log message also contains a `sampled` flag, which relates to the sampling described below and indicates whether the span was sampled - sent to the collector.
The format of the log records may start as follows:
[source, bash]
----
2024-08-05 15:27:07,144 traceId=b636ac4c665ceb901f7fdc3fc7e80154, parentId=d59cea113d0c2549, spanId=d59cea113d0c2549, sampled=true WARN [org.keycloak.events] ...
----
=== Hide trace info in logs
You can hide tracing information in specific log handlers by specifying their associated {project_name} option `log-<handler-name>-include-trace`, where `<handler-name>` is the name of the log handler.
For instance, to disable trace info in the `console` log, you can turn it off as follows:
<@kc.start parameters="--tracing-enabled=true --log=console --log-console-include-trace=false"/>
NOTE: When you explicitly override the log format for the particular log handlers, the `*-include-trace` options do not have any effect, and no tracing is included.
== Sampling
Sampler decides whether a trace should be discarded or forwarded, effectively reducing overhead by limiting the number of collected traces sent to the collector.
It helps manage resource consumption, which leads to avoiding the huge storage costs of tracing every single request and potential performance penalty.
WARNING: For a production-ready environment, sampling should be properly set to minimize infrastructure costs.
{project_name} supports several built-in OpenTelemetry samplers, such as:
<@opts.expectedValues option="tracing-sampler-type"/>
The used sampler can be changed via the `tracing-sampler-type` property.
=== Default sampler
The default sampler for {project_name} is `traceidratio`, which controls the rate of trace sampling based on a specified ratio configurable via the `tracing-sampler-ratio` property.
==== Trace ratio
The default trace ratio is `1.0`, which means all traces are sampled - sent to the collector.
The ratio is a floating number in the range `(0,1]`.
For instance, when the ratio is `0.1`, only 10% of the traces are sampled.
WARNING: For a production-ready environment, the trace ratio should be a smaller number to prevent the massive cost of trace store infrastructure and avoid performance overhead.
==== Rationale
The sampler makes its own sampling decisions based on the current ratio of sampled spans, regardless of the decision made on the parent span,
as with using the `parentbased_traceidratio` sampler.
The `parentbased_traceidratio` sampler could be the preferred default type as it ensures the sampling consistency between parent and child spans.
Specifically, if a parent span is sampled, all its child spans will be sampled as well - the same sampling decision for all.
It helps to keep all spans together and prevents storing incomplete traces.
However, it might introduce certain security risks leading to DoS attacks.
External callers can manipulate trace headers, parent spans can be injected, and the trace store can be overwhelmed.
Proper HTTP headers (especially `tracestate`) filtering and adequate measures of caller trust would need to be assessed.
For more information, see the https://www.w3.org/TR/trace-context/#security-considerations[W3C Trace context] document.
</@tmpl.guide>