Observability

To support easier operation and integration in diverse APM products heimdall implements different observability mechanisms. The following sections provide insights about what can be expected.

Logging

Heimdall’s implementation uses zerolog - Zero Allocation JSON Logger, which can however also log in plain text. All emitted log statements include information related to distributed tracing (if tracing is enabled) so that not only log statements can be correlated to traces, but also all log statements belonging to single request/transaction can be correlated as well.

Configuration

Logging configuration can be adjusted in the log property of heimdall’s configuration and supports following properties.

  • format: string (optional)

    With text and gelf being the available formats. text is the default format. And gelf defines a JSON format adhering to GELF.

    Usage of text (default) format is not recommended for production deployments as it requires more computational resources and is hence slow.
    Example 1. Configuring logging to emit logs using GELD format.
    log:
      format: gelf
  • level: string (optional)

    Following log levels are available: trace, debug, info, warn, error, fatal, panic and disabled. The last one effectively disables logging. You usually do not want to do this. By default, the level is set to error.

    debug and trace log levels are not meant for production as setting log level to one of these will result in a high log verbosity and also affect performance pretty much. Both are only meant for setup analysis and debugging! trace log level will also dump all incoming and outgoing HTTP requests and responses, as well as the contents of objects used in templates. That dump is unedited. That means sensitive data will be available in logs.
    Due to some limitations of the GRPC framework capabilities, setting log level to trace will not dump GRPC requests and responses. If you need these for analysis purposes (e.g. when debugging the integration with envoy proxy), you’ll have to set GODEBUG as well as the GRPC_GO_LOG_VERBOSITY_LEVEL and GRPC_GO_LOG_SEVERITY_LEVEL environment variables.
    Example 2. Configuring logging to emit logs in debug level.
    log:
      level: debug

Regular Log Events

If you configure heimdall to log in text format, you can expect output similar to the one shown below:

2022-08-03T12:51:48+02:00 INF Opentelemetry tracing initialized.
2022-08-03T12:51:48+02:00 INF Instantiating in memory cache
2022-08-03T12:51:48+02:00 DBG Creating rule set event queue.
2022-08-03T12:51:48+02:00 INF Loading pipeline definitions
2022-08-03T12:51:48+02:00 DBG Loading definitions for authenticators
2022-08-03T12:51:48+02:00 DBG Loading pipeline definition id=anonymous_authenticator type=anonymous
...
2022-08-03T12:51:52+02:00 DBG Decision endpoint called
2022-08-03T12:51:52+02:00 DBG Executing default rule
2022-08-03T12:51:52+02:00 DBG Authenticating using anonymous authenticator
2022-08-03T12:51:52+02:00 DBG Finalizing using JWT finalizer
2022-08-03T12:51:52+02:00 DBG Generating new JWT
2022-08-03T12:51:52+02:00 DBG Finalizing request
Usage of this format is not recommended for production deployments as it requires more computational resources and is hence slow.

Otherwise, if you configure it to use gelf (see GELF for format details) format, the output will look as follows:

{"_level_name": "INFO", "version":"1.1", "host": "unknown", "timestamp": 1659523288,
 "level": 6, "short_message": "Opentracing tracer initialized."}
{"_level_name": "INFO", "version": "1.1", "host": "unknown", "timestamp": 1659523288,
 "level": 6, "short_message": "Instantiating in memory cache"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523288,
 "level": 7, "short_message": "Creating rule set event queue."}
{"_level_name": "INFO", "version": "1.1", "host": "unknown", "timestamp": 1659523288,
 "level": 6, "short_message": "Loading pipeline definitions"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523288,
 "level": 7,"short_message": "Loading definitions for authenticators"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "id": "anonymous_authenticator",
 "type": "anonymous","timestamp": 1659523288,
 "level": 7, "short_message": "Loading pipeline definition"}

...

{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523295,
 "level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
 "_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Decision endpoint called"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp":1659523295,
 "level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
 "_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Executing default rule"}
{"_level_name": "DEBUG", "version":"1.1", "host": "unknown", "timestamp":1659523295,
 "level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
 "_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Authenticating using anonymous authenticator"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523295,
 "level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
 "_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Finalizing using JWT finalizer"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523295,
 "level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
 "_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Generating new JWT"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523295,
 "level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
 "_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Finalizing request"}

Each log statement does also include the following fields in both log formats for incoming requests if tracing is enabled:

  • _trace_id - The trace id as defined by OpenTelemetry.

  • _span_id - The span id as defined by OpenTelemetry of the current transaction.

  • _parent_id - The span id of the callee, which started the given transaction. Only present if the callee set the corresponding tracing header.

Access Log Events

In addition to regular logs, heimdall emits access logs. These events are always emitted regardless the configured log level and the log level of these events in the log output is always INFO.

Each request to any of heimdall’s endpoints will result in two access events:

  • event describing the start of the transaction, and an

  • event describing the finalization of the transaction.

Following are the fields, which are always set for both events:

  • _tx_start - Timestamp in Unix epoch format, when the transaction started.

  • _client_ip - The IP of the client of the request.

If the event has been emitted for an HTTP request, following fields are set as well:

  • _http_method - The HTTP method used by the client while calling heimdall’s endpoint.

  • _http_path - The used HTTP path.

  • _http_user_agent - The agent used by the client. The value is taken from the HTTP "User-Agent" header.

  • _http_host - The host part of the URI, the client is using while communicating with Heimdall.

  • _http_scheme - The scheme part of the URI, the client is using while communicating with Heimdall.

If the event has been emitted for a GRPC request, following fields are set:

  • _grpc_method - The full GRPC method used.

If the request comes from an intermediary, like e.g. an API Gateway and heimdall is configured to trust that "proxy" (see trusted_proxies configuration of the Decision service, as well as the trusted_proxies configuration of the Proxy service), then following fields will be part of the events as well if the corresponding HTTP headers were sent.

  • _http_x_forwarded_proto - The value of the "X-Forwarded-Proto" header.

  • _http_x_forwarded_host - The value of the "X-Forwarded-Host" header.

  • _http_x_forwarded_uri - The value of the "X-Forwarded-Uri" header.

  • _http_x_forwarded_for - The value of the "X-Forwarded-For" header.

  • _http_forwarded - The value of the "Forwarded" header.

Following are the fields, which are set in the transaction finalization event in addition:

  • _body_bytes_sent - The length of the response body.

  • _tx_duration_ms - The duration of the transaction in milliseconds. If heimdall is operated in proxy mode, it will also include the time used to communicate with the upstream service.

  • _access_granted - Set either to true or false, indicating whether heimdall granted access or not.

  • _subject - The subject identifier if the access was granted.

  • _error - The information about an error, which e.g. led to the denial of the request.

If the finalization event has been emitted for an HTTP request, following fields are set as well:

  • _http_status_code - The numeric HTTP response status code

If the finalization event has been emitted for a GRPC request, following fields are set:

  • _grpc_status_code - The numeric GRPC status code.

Following are the fields, which are set if tracing is enabled:

  • _trace_id - The trace id as defined by OpenTelemetry.

  • _span_id - The span id as defined by OpenTelemetry of the current transaction.

  • _parent_id - The span id of the callee, which started the given transaction. Only present if the callee set the corresponding tracing header.

If you configure heimdall to log in text format, you can expect output as shown below:

2022-08-03T12:40:16+02:00 INF TX started _client_ip=127.0.0.1 _http_host=127.0.0.1:4468 _http_method=GET
 _http_path=/foo _http_scheme=http _http_user_agent=curl/7.74.0 _parent_id=3449bda63ed70206
 _span_id=f57c007257fee0ed _trace_id=00000000000000000a5af97bffe6a8a2 _tx_start=1659523216

....

2022-08-03T12:40:16+02:00 INF TX finished _access_granted=true _body_bytes_sent=0 _client_ip=127.0.0.1
 _http_host=127.0.0.1:4468 _http_method=GET _http_path=/foo _http_scheme=http _http_status_code=202
 _http_user_agent=curl/7.74.0 _subject=anonymous _parent_id=3449bda63ed70206 _span_id=f57c007257fee0ed
 _trace_id=00000000000000000a5af97bffe6a8a2 _tx_duration_ms=0 _tx_start=1659523216

Otherwise, if you configure it to use gelf format, the output will look as follows:

{"_level_name": "INFO", "version":"1.1", "host":"unknown", "_tx_start":1659523295,
 "_client_ip": "127.0.0.1", "_http_method": "GET", "_http_path":"/foo",
 "_http_user_agent": "curl/7.74.0", "_http_host": "127.0.0.1:4468", "_http_scheme": "http",
 "timestamp": 1659523295, "level": 6, "_parent_id": "3449bda63ed70206",
 "_span_id": "f57c007257fee0ed", "_trace_id": "00000000000000000a5af97bffe6a8a2",
 "short_message": "TX started"}

....

{"_level_name": "INFO", "version": "1.1", "host": "unknown", "_tx_start": 1659523295,
 "_client_ip": "127.0.0.1", "_http_method": "GET", "_http_path": "/foo",
 "_http_user_agent": "curl/7.74.0", "_http_host": "127.0.0.1:4468", "_http_scheme": "http",
 "_body_bytes_sent": 0, "_http_status_code":200, "_tx_duration_ms":0, "_subject": "anonymous",
 "_access_granted": true, "timestamp":1659523295, "level": 6, "_parent_id": "3449bda63ed70206",
 "_span_id": "f57c007257fee0ed", "_trace_id": "00000000000000000a5af97bffe6a8a2",
 "short_message": "TX finished"}

Tracing

Heimdall makes use of OpenTelemetry for distributed tracing to support recording of paths taken by requests and supports all environment variables including the defined values according to OpenTelemetry Environment Variables and OpenTelemetry SDK Configuration specifications. In addition to these environment variables, heimdall defines some additional options, which are described below and can be used to tune the behaviour.

Configuration

By default, tracing is enabled and can be disabled or slightly tuned in the tracing property of heimdall’s configuration supporting the following properties.

  • enabled: boolean (optional)

    By making use of this property, you can configure heimdall to enable or disable tracing. Defaults to true (tracing enabled).

    Example 3. Disabling tracing.
    tracing:
      enabled: false
  • span_processor: string (optional)

    With simple and batch being the available options. By setting this property, you can define how heimdall shall process the created spans. Defaults to batch. This property is only there, as there is no corresponding setting option via OTEL tracing environment variables available.

    Example 4. Setting the span processor to export completed spans in batches.
    tracing:
      span_processor: batch

    Description of the options:

    • simple - Configures heimdall to synchronously export the created spans via configured exporters. This is not recommended for production use as it is slow and has a high computation resource usage overhead. It might be however good for testing, debugging, or showing examples of other features.

    • batch - Configures heimdall to send completed span batches to the exporter. Recommended setting for production use.

Tracing Context Propagation

When a request arrives at heimdall, it will create a trace context object based on the received headers, which according to OTEL are the traceparent and tracestate HTTP headers, defined in W3C Trace Context as well as baggage HTTP header, defined in W3C Baggage. The creation of that context, as well as the transformation of it into new HTTP headers, set by heimdall while communicating with further services is named propagation and the components responsible for the creation of such context object are named propagators.

Since not every service in a multi-service system may set or understand the above OTEL specific HTTP headers (as these might still be using tracing vendor specific headers), interoperability can be achieved by configuring the required propagators by making use of the OTEL_PROPAGATORS environment variable. OTEL defines the following values for this variable:

All of these are supported by heimdall. In addition, following propagators can be configured as well:

Configured propagators are used for inbound, as well as for outbound traffic.

Span Exporters

Span Exporter handle the delivery of spans to external receivers (collectors, or agents). This is the final component in the trace export pipeline and typically provided by the APM vendor, like Jaeger, Zipkin, Instana, etc. Since not every multi-service system may have an up to day telemetry receiver supporting protocols defined by OTEL, interoperability can be achieved by configuring the required exporters by making use of the OTEL_TRACES_EXPORTER environment variable. OTEL defines the following values for this variable[2]:

  • otlp - OTLP exporter. Enabled by default if OTEL_TRACES_EXPORTER is not set.

  • zipkin - Zipkin exporter to export spans in Zipkin data model.

  • none - No automatically configured exporter for traces.

All of these are supported by heimdall. In addition, following exporters can be configured as well:

  • instana - Instana exporter to export spans in Instana data model.[3]

Example Configuration

The environment variables set below configure heimdall to use Jaeger propagator and to export the spans via OTLP over grpc to the collector available under http://collector:4317.

OTEL_PROPAGATORS=jaeger
OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://collector:4317

If your environment supports OpenTelemetry and usage of defaults is ok, most probably, the only required environment variable to be set might be the OTEL_EXPORTER_OTLP_TRACES_ENDPOINT.

Metrics

Heimdall makes use of OpenTelemetry to emit metrics. Depending on the configuration, push or pull based metrics export is supported.

Configuration

As with tracing, the entire configuration happens via environment variables as defined by OpenTelemetry Environment Variables and OpenTelemetry SDK Configuration specifications. The generated metrics can either be exported by pushing these to an OTEL collector (the default behavior). Alternatively, the available metrics can be pulled and are then exported in the (Prometheus style).

Metrics are generated and exported by default. If this is not desired, you can explicitly disable this behavior within the metrics property of heimdall’s configuration by making use of the following property:

  • enabled: boolean (optional)

    By making use of this property, you can configure heimdall to enable or disable the metrics collection. Defaults to true (metrics exposure is enabled).

    Example 5. Disabling metrics.
    metrics:
      enabled: false

Metric Exporters

By default, metrics are pushed to the OTEL collector using the http/protobuf transport protocol. You can change that behavior by making use of either the OTEL_EXPORTER_OTLP_METRICS_PROTOCOL, or the OTEL_EXPORTER_OTLP_PROTOCOL environment variable.

To let heimdall know where to push the metrics to, either OTEL_EXPORTER_OTLP_METRICS_ENDPOINT, or OTEL_EXPORTER_OTLP_ENDPOINT must be defined.

To let heimdall expose metrics over a pull based service (Prometheus style), the OTEL_METRICS_EXPORTER environment variable must be set to "prometheus". In that case heimdall will expose the 127.0.0.1:9464/metrics endpoint which can be queried using the HTTP GET verb. You can change the host and port of that service by making use of the OTEL_EXPORTER_PROMETHEUS_HOST and OTEL_EXPORTER_PROMETHEUS_PORT environment variables.

You can also disable metrics export by setting the OTEL_METRICS_EXPORTER environment variable to none.

Available Metrics

All, but custom metrics adhere to the OpenTelementry semantic conventions. For that reason, only the custom metrics are listed in the table below.

Metric: certificate.expiry

Number of seconds until a certificate used by a particular service (decision, proxy, management), or mechanism (e.g. jwt finalizer) expires. The metric type is UpDownCounter und the unit is s.

AttributeTypeDescription

service

string

The service, the certificate is configured for.

issuer

string

Issuer DN of the certificate.

serial_nr

string

The serial number of the certificate.

subject

string

Subject DN of the certificate.

dns_names

string

DNS entries in the SAN extension

Runtime Profiling

If enabled, heimdall exposes a /debug/pprof HTTP endpoint on port 10251 (See also the configuration options below) on which runtime profiling data in the profile.proto format (also known as pprof format) can be consumed by APM tools, like Google’s pprof, Grafana Phlare, Pyroscope and many more for visualization purposes. Following information is available:

  • allocs - A sampling of all past memory allocations.

  • block - Stack traces that led to blocking on synchronization primitives.

  • cmdline - The command line invocation of the current program, with arguments separated by NUL bytes.

  • goroutine - Stack traces of all current goroutines.

  • heap - A sampling of memory allocations of live objects.

  • mutex - Stack traces of holders of contended mutexes.

  • profile - Cpu profile. Profiling lasts for duration specified in seconds parameter, or for 30 seconds if not specified

  • symbol - Looks up the program counters listed in the request, responding with a table mapping program counters to function names.

  • threadcreate - Stack traces that led to the creation of new OS threads.

  • trace - Execution trace in binary form. Tracing lasts for duration specified in seconds parameter, or for 1 second if not specified.

See also the API documentation for the documentation of the actual API.

Configuration

Configuration for this service can be adjusted in the profiling property of heimdall’s configuration by making use of the following properties.

  • enabled: boolean (optional)

    By making use of this property, you can configure heimdall to enable or disable runtime profiling. Defaults to false (profiling disabled).

    Example 6. Enabling profiling.
    profiling:
      enabled: true
  • host: string (optional)

    By making use of this property, you can specify the TCP/IP address on which heimdall should listen for connections from client applications interested in profiling data. The entry 0.0.0.0 allows listening for all IPv4 addresses. Defaults to 127.0.0.1, which allow only local TCP/IP “loopback” connections to be made.

    If you operate heimdall in a container, you should set this property to a value, which would allow your APM system to scrape this information.
    Example 7. Configure heimdall to listen on 192.168.2.10.
    profiling:
      host: 192.168.2.10
  • port: integer (optional)

    By making use of this property, you can specify the TCP port the heimdall should listen on. Defaults to 10251.

    Example 8. Configure heimdall to listen on port 9999 for runtime profiling requests.
    profiling:
      port: 9999

1. Datadog supports the OTLP protokoll. For that reason, there is no exporter available.
2. jaeger exporter has been marked as deprecated and is not supported anymore
3. Instana supports the W3C header used by OTEL. For that reason, there is no propagator available.

Last updated on Sep 1, 2024