afcbf79582
Closes #32231 Signed-off-by: Martin Bartoš <mabartos@redhat.com>
135 lines
7.1 KiB
Text
135 lines
7.1 KiB
Text
<#import "/templates/guide.adoc" as tmpl>
|
|
<#import "/templates/kc.adoc" as kc>
|
|
<#import "/templates/options.adoc" as opts>
|
|
<#import "/templates/links.adoc" as links>
|
|
<#import "/templates/profile.adoc" as profile>
|
|
|
|
<@tmpl.guide title="Enabling Tracing"
|
|
preview="true"
|
|
summary="Learn how to enable distributed tracing in {project_name}"
|
|
includedOptions="tracing-* log-*-include-trace">
|
|
|
|
This {section} explains how you can enable and configure distributed tracing in {project_name} by utilizing https://opentelemetry.io/[OpenTelemetry] (OTel).
|
|
Tracing allows for detailed monitoring of each request's lifecycle, which helps quickly identify and diagnose issues, leading to more efficient debugging and maintenance.
|
|
|
|
It also provides valuable insights into performance bottlenecks and can help optimize the system's overall efficiency.
|
|
{project_name} uses a supported https://quarkus.io/guides/opentelemetry-tracing[Quarkus OTel extension] that provides smooth integration and exposure of application traces.
|
|
|
|
== Enable tracing
|
|
|
|
It is possible to enable exposing traces using the build time option `tracing-enabled`, and enabling `opentelemetry` feature as follows:
|
|
|
|
<@kc.start parameters="--tracing-enabled=true --features=opentelemetry"/>
|
|
|
|
By default, the trace exporters send out data in batches, using the `gRPC` protocol and endpoint `+http://localhost:4317+`.
|
|
|
|
The default service name is `keycloak`, specified via the `tracing-service-name` property, which takes precedence over `service.name` defined in the `tracing-resource-attributes` property.
|
|
|
|
For more information about resource attributes that can be provided via the `tracing-resource-attributes` property, see the https://quarkus.io/guides/opentelemetry#resource[Quarkus OpenTelemetry Resource] guide.
|
|
|
|
For more tracing settings, see all possible configurations below.
|
|
|
|
== Development setup
|
|
|
|
In order to see the captured {project_name} traces, the basic setup with leveraging the https://www.jaegertracing.io/[Jaeger] tracing platform might be used.
|
|
For development purposes, the Jaeger-all-in-one can be used to see traces as easily as possible.
|
|
|
|
NOTE: Jaeger-all-in-one includes the Jaeger agent, an OTel collector, and the query service/UI.
|
|
You do not need to install a separate collector, as you can directly send the trace data to Jaeger.
|
|
|
|
[source, bash]
|
|
----
|
|
podman|docker run --name jaeger \
|
|
-p 16686:16686 \
|
|
-p 4317:4317 \
|
|
-p 4318:4318 \
|
|
jaegertracing/all-in-one
|
|
----
|
|
|
|
=== Exposed ports
|
|
|
|
* `:16686` - Jaeger UI
|
|
* `:4317` - OpenTelemetry Protocol gRPC receiver (default)
|
|
* `:4318` - OpenTelemetry Protocol HTTP receiver
|
|
|
|
You can visit the Jaeger UI on `+http://localhost:16686/+` to see the tracing information.
|
|
The Jaeger UI might look like this with an arbitrary {project_name} trace:
|
|
|
|
image::jaeger-tracing.png[Jaeger UI]
|
|
|
|
== Traces in logs
|
|
|
|
When tracing is enabled, the trace information is included in the log messages of all enabled log handlers (see more in <@links.server id="logging"/>).
|
|
It can be useful for associating log events to request execution, which might provide better traceability and debugging.
|
|
All log lines originating from the same request will have the same `traceId` in the log.
|
|
|
|
The log message also contains a `sampled` flag, which relates to the sampling described below and indicates whether the span was sampled - sent to the collector.
|
|
|
|
The format of the log records may start as follows:
|
|
|
|
[source, bash]
|
|
----
|
|
2024-08-05 15:27:07,144 traceId=b636ac4c665ceb901f7fdc3fc7e80154, parentId=d59cea113d0c2549, spanId=d59cea113d0c2549, sampled=true WARN [org.keycloak.events] ...
|
|
----
|
|
|
|
=== Hide trace info in logs
|
|
|
|
You can hide tracing information in specific log handlers by specifying their associated {project_name} option `log-<handler-name>-include-trace`, where `<handler-name>` is the name of the log handler.
|
|
For instance, to disable trace info in the `console` log, you can turn it off as follows:
|
|
|
|
<@kc.start parameters="--tracing-enabled=true --features=opentelemetry --log=console --log-console-include-trace=false"/>
|
|
|
|
NOTE: When you explicitly override the log format for the particular log handlers, the `*-include-trace` options do not have any effect, and no tracing is included.
|
|
|
|
== Sampling
|
|
|
|
Sampler decides whether a trace should be discarded or forwarded, effectively reducing overhead by limiting the number of collected traces sent to the collector.
|
|
It helps manage resource consumption, which leads to avoiding the huge storage costs of tracing every single request and potential performance penalty.
|
|
|
|
WARNING: For a production-ready environment, sampling should be properly set to minimize infrastructure costs.
|
|
|
|
{project_name} supports several built-in OpenTelemetry samplers, such as:
|
|
|
|
<@opts.expectedValues option="tracing-sampler-type"/>
|
|
|
|
The used sampler can be changed via the `tracing-sampler-type` property.
|
|
|
|
=== Default sampler
|
|
The default sampler for {project_name} is `traceidratio`, which controls the rate of trace sampling based on a specified ratio configurable via the `tracing-sampler-ratio` property.
|
|
|
|
==== Trace ratio
|
|
The default trace ratio is `1.0`, which means all traces are sampled - sent to the collector.
|
|
The ratio is a floating number in the range `(0,1]`.
|
|
For instance, when the ratio is `0.1`, only 10% of the traces are sampled.
|
|
|
|
WARNING: For a production-ready environment, the trace ratio should be a smaller number to prevent the massive cost of trace store infrastructure and avoid performance overhead.
|
|
|
|
==== Rationale
|
|
|
|
The sampler makes its own sampling decisions based on the current ratio of sampled spans, regardless of the decision made on the parent span,
|
|
as with using the `parentbased_traceidratio` sampler.
|
|
|
|
The `parentbased_traceidratio` sampler could be the preferred default type as it ensures the sampling consistency between parent and child spans.
|
|
Specifically, if a parent span is sampled, all its child spans will be sampled as well - the same sampling decision for all.
|
|
It helps to keep all spans together and prevents storing incomplete traces.
|
|
|
|
However, it might introduce certain security risks leading to DoS attacks.
|
|
External callers can manipulate trace headers, parent spans can be injected, and the trace store can be overwhelmed.
|
|
Proper HTTP headers (especially `tracestate`) filtering and adequate measures of caller trust would need to be assessed.
|
|
|
|
For more information, see the https://www.w3.org/TR/trace-context/#security-considerations[W3C Trace context] document.
|
|
|
|
== Tracing in Kubernetes environment
|
|
When the tracing is enabled when using the {project_name} Operator, certain information about the deployment is propagated to the underlying containers.
|
|
|
|
NOTE: There is no support for tracing configuration in {project_name} CR yet, so the `additionalOptions` can be used to the `tracing-enabled` property and other tracing options.
|
|
|
|
You can filter out the required traces in your tracing backend based on their tags:
|
|
|
|
* `service.name` - {project_name} deployment name
|
|
* `k8s.namespace.name` - Namespace
|
|
* `host.name` - Pod name
|
|
|
|
{project_name} Operator automatically sets the `KC_TRACING_SERVICE_NAME` and `KC_TRACING_RESOURCE_ATTRIBUTES` environment variables for each {project_name} container included in pods it manages.
|
|
|
|
</@tmpl.guide>
|