2022-08-03T12:51:48+02:00 INF Opentelemetry tracing initialized.
2022-08-03T12:51:48+02:00 INF Instantiating in memory cache
2022-08-03T12:51:48+02:00 DBG Creating rule set event queue.
2022-08-03T12:51:48+02:00 INF Loading pipeline definitions
2022-08-03T12:51:48+02:00 DBG Loading definitions for authenticators
2022-08-03T12:51:48+02:00 DBG Loading pipeline definition id=noop_authenticator type=noop
2022-08-03T12:51:48+02:00 DBG Loading pipeline definition id=anonymous_authenticator type=anonymous
...
2022-08-03T12:51:52+02:00 DBG Decision endpoint called
2022-08-03T12:51:52+02:00 DBG Executing default rule
2022-08-03T12:51:52+02:00 DBG Authenticating using anonymous authenticator
2022-08-03T12:51:52+02:00 DBG Mutating using JWT unifier
2022-08-03T12:51:52+02:00 DBG Generating new JWT
2022-08-03T12:51:52+02:00 DBG Finalizing requestObservability
Hemdall implements different observability mechanisms to support easier operation and integration in different APM products. The following sections provide insights about what can be expected.
Logging in Heimdall
Heimdall’s implementation uses zerolog - Zero Allocation JSON Logger, which can however also log in plain text. All emitted log statements include information related to distributed tracing (if tracing is enabled) so that not only log statements can be correlated to traces, but also all log statements belonging to single request/transaction can be correlated as well.
Available Logging configuration options are described in Logging Configuration.
Regular Log Events
If you configure heimdall to log in text format, you can expect output similar to the one shown below:
| Usage of this format is not recommended for production deployments as it requires more computational resources and is hence slow. |
Otherwise, if you configure it to use gelf (see GELF for format details) format, the output will look as follows:
{"_level_name": "INFO", "version":"1.1", "host": "unknown", "timestamp": 1659523288,
"level": 6, "short_message": "Opentracing tracer initialized."}
{"_level_name": "INFO", "version": "1.1", "host": "unknown", "timestamp": 1659523288,
"level": 6, "short_message": "Instantiating in memory cache"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523288,
"level": 7, "short_message": "Creating rule set event queue."}
{"_level_name": "INFO", "version": "1.1", "host": "unknown", "timestamp": 1659523288,
"level": 6, "short_message": "Loading pipeline definitions"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523288,
"level": 7,"short_message": "Loading definitions for authenticators"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "id": "noop_authenticator",
"type": "noop","timestamp": 1659523288,
"level": 7, "short_message": "Loading pipeline definition"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "id": "anonymous_authenticator",
"type": "anonymous","timestamp": 1659523288,
"level": 7, "short_message": "Loading pipeline definition"}
...
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523295,
"level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
"_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Decision endpoint called"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp":1659523295,
"level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
"_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Executing default rule"}
{"_level_name": "DEBUG", "version":"1.1", "host": "unknown", "timestamp":1659523295,
"level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
"_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Authenticating using anonymous authenticator"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523295,
"level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
"_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Mutating using JWT unifier"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523295,
"level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
"_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Generating new JWT"}
{"_level_name": "DEBUG", "version": "1.1", "host": "unknown", "timestamp": 1659523295,
"level": 7, "_parent_id": "3449bda63ed70206", "_span_id": "f57c007257fee0ed",
"_trace_id": "00000000000000000a5af97bffe6a8a2", "short_message": "Finalizing request"}Each log statement does also include the following fields in both log formats for incoming requests if tracing is enabled:
_trace_id- The trace id as defined by OpenTelemetry._span_id- The span id as defined by OpenTelemetry of the current transaction._parent_id- The span id of the callee, which started the given transaction. Only present if the callee set the corresponding tracing header.
Access Log Events
In addition to regular logs, heimdall emits access logs. These events are always emitted regardless the configured log level and the log level of these events in the log output is always INFO.
Each request to any of heimdall’s endpoints will result in two access events:
event describing the start of the transaction, and an
event describing the finalization of the transaction.
Following are the fields, which are always set for both events:
_tx_start- Timestamp in Unix epoch format, when the transaction started._client_ip- The IP of the client of the request.
If the event has been emitted for an HTTP request, following fields are set as well:
_http_method- The HTTP method used by the client while calling heimdall’s endpoint._http_path- The used HTTP path._http_user_agent- The agent used by the client. The value is taken from the HTTP "User-Agent" header._http_host- The host part of the URI, the client is using while communicating with Heimdall._http_scheme- The scheme part of the URI, the client is using while communicating with Heimdall.
If the event has been emitted for a GRPC request, following fields are set:
_grpc_method- The full GRPC method used.
If the request comes from an intermediary, like e.g. an API Gateway and heimdall is configured to trust that "proxy" (see trusted_proxies configuration of the Decision service, as well as the trusted_proxies configuration of the Proxy service), then following fields will be part of the events as well if the corresponding HTTP headers were sent.
_http_x_forwarded_proto- The value of the "X-Forwarded-Proto" header._http_x_forwarded_host- The value of the "X-Forwarded-Host" header._http_x_forwarded_path- The value of the "X-Forwarded-Path" header._http_x_forwarded_uri- The value of the "X-Forwarded-Uri" header._http_x_forwarded_for- The value of the "X-Forwarded-For" header._http_forwarded- The value of the "Forwarded" header.
Following are the fields, which are set in the transaction finalization event in addition:
_body_bytes_sent- The length of the response body._tx_duration_ms- The duration of the transaction in milliseconds. If heimdall is operated in proxy mode, it will also include the time used to communicate with the upstream service._access_granted- Set either totrueorfalse, indicating whether heimdall granted access or not._subject- The subject identifier if the access was granted._error- The information about an error, which e.g. led to the denial of the request.
If the finalization event has been emitted for an HTTP request, following fields are set as well:
_http_status_code- The numeric HTTP response status code
If the finalization event has been emitted for a GRPC request, following fields are set:
_grpc_status_code- The numeric GRPC status code.
Following are the fields, which are set if tracing is enabled:
_trace_id- The trace id as defined by OpenTelemetry._span_id- The span id as defined by OpenTelemetry of the current transaction._parent_id- The span id of the callee, which started the given transaction. Only present if the callee set the corresponding tracing header.
If you configure heimdall to log in text format, you can expect output as shown below:
2022-08-03T12:40:16+02:00 INF TX started _client_ip=127.0.0.1 _http_host=127.0.0.1:4468 _http_method=GET
_http_path=/foo _http_scheme=http _http_user_agent=curl/7.74.0 _parent_id=3449bda63ed70206
_span_id=f57c007257fee0ed _trace_id=00000000000000000a5af97bffe6a8a2 _tx_start=1659523216
....
2022-08-03T12:40:16+02:00 INF TX finished _access_granted=true _body_bytes_sent=0 _client_ip=127.0.0.1
_http_host=127.0.0.1:4468 _http_method=GET _http_path=/foo _http_scheme=http _http_status_code=202
_http_user_agent=curl/7.74.0 _subject=anonymous _parent_id=3449bda63ed70206 _span_id=f57c007257fee0ed
_trace_id=00000000000000000a5af97bffe6a8a2 _tx_duration_ms=0 _tx_start=1659523216Otherwise, if you configure it to use gelf format, the output will look as follows:
{"_level_name": "INFO", "version":"1.1", "host":"unknown", "_tx_start":1659523295,
"_client_ip": "127.0.0.1", "_http_method": "GET", "_http_path":"/foo",
"_http_user_agent": "curl/7.74.0", "_http_host": "127.0.0.1:4468", "_http_scheme": "http",
"timestamp": 1659523295, "level": 6, "_parent_id": "3449bda63ed70206",
"_span_id": "f57c007257fee0ed", "_trace_id": "00000000000000000a5af97bffe6a8a2",
"short_message": "TX started"}
....
{"_level_name": "INFO", "version": "1.1", "host": "unknown", "_tx_start": 1659523295,
"_client_ip": "127.0.0.1", "_http_method": "GET", "_http_path": "/foo",
"_http_user_agent": "curl/7.74.0", "_http_host": "127.0.0.1:4468", "_http_scheme": "http",
"_body_bytes_sent": 0, "_http_status_code":200, "_tx_duration_ms":0, "_subject": "anonymous",
"_access_granted": true, "timestamp":1659523295, "level": 6, "_parent_id": "3449bda63ed70206",
"_span_id": "f57c007257fee0ed", "_trace_id": "00000000000000000a5af97bffe6a8a2",
"short_message": "TX finished"}Tracing in Heimdall
Heimdall makes use of OpenTelemetry for distributed tracing to support recording of paths taken by requests and supports all environment variables including the defined values according to OpenTelemetry Environment Variable Specification. In addition to these environment variables, heimdall defines its own tracing options, which are described in Tracing Configuration and can be used to tune the behaviour.
| Tracing is enabled by default. |
On one hand, this chapter serves the purpose of a quick introduction to distributed tracing with heimdall and on the other hand to list the options that go beyond the standard OTEL definitions.
Tracing Context Propagation
When a request arrives at heimdall, it will create a trace context object based on the received headers, which according to OTEL are the traceparent and tracestate HTTP headers, defined in W3C Trace Context as well as baggage HTTP header, defined in W3C Baggage. The creation of that context, as well as the transformation of it into new HTTP headers, set by heimdall while communicating with further services is named propagation and the components responsible for the creation of such context object are named propagators.
Since not every service in a multi-service system may set or understand the above OTEL specific HTTP headers (as these might still be using tracing vendor specific headers), interoperability can be achieved by configuring the required propagators by making use of the OTEL_PROPAGATORS environment variable. OTEL defines the following values for this variable:
tracecontext- W3C Trace Context propagator. Enabled by default, ifOTEL_PROPAGATORSis not set.baggage- W3C Baggage propagator. Enabled by default ifOTEL_PROPAGATORSis not set.b3- Zipkin B3 Single Header propagator.b3multi- Zipkin B3 Multiple Header propagator.jaeger- Jaeger Header propagator.xray- AWS X-Ray Header propagator.ottrace- OT Trace Header propagator.none- No automatically configured propagator.
All of these are supported by heimdall. In addition, following propagators can be configured as well:
datadog- Datadog APM Trace Header propagator.[1]
Configured propagators are used for inbound, as well as for outbound traffic.
Span Exporters
Span Exporter handle the delivery of spans to external receivers (collectors, or agents). This is the final component in the trace export pipeline and typically provided by the APM vendor, like Jaeger, Zipkin, Instana, etc. Since not every multi-service system may have an up to day telemetry receiver supporting protocols defined by OTEL, interoperability can be achieved by configuring the required exporters by making use of the OTEL_TRACES_EXPORTER environment variable. OTEL defines the following values for this variable:
otlp- OTLP exporter. Enabled by default ifOTEL_TRACES_EXPORTERis not set.jaeger- Jaeger Thrift exporter to export spans in Jaeger data model.zipkin- Zipkin exporter to export spans in Zipkin data model.none- No automatically configured exporter for traces.
All of these are supported by heimdall. In addition, following exporters can be configured as well:
Example Configuration
The environment variables set below configure heimdall to use Jaeger propagator and to export the spans via OTLP over grpc to the collector available under http://collector:4317.
OTEL_PROPAGATORS=jaeger
OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://collector:4317If your environment supports OpenTelemetry and usage of defaults is ok, most probably, the only required environment variable to be set might be the OTEL_EXPORTER_OTLP_TRACES_ENDPOINT.
Metrics in Heimdall
By default, heimdall exposes Prometheus style metrics on the 127.0.0.1:10250/metrics HTTP endpoint (See also Metrics Configuration) which comprise the following information:
Process information, like CPU, memory, file descriptor usage and start time
Go runtime information, including details about GC, number of goroutines and OS threats
Information about the metrics endpoint itself, including the number of internal errors encountered while gathering the metrics, number of current inflight and overall scrapes done.
Information about the decision and proxy requests handled, including the total amount and duration of http requests by status code, method and path, as well as information about requests in progress.
Information about expiry for configured certificates.
The following table provide detailed information about these
| Metric | Type | Description |
|---|---|---|
Process Information | ||
| Counter | Total user and system CPU time spent in seconds. |
| Gauge | Maximum number of open file descriptors. |
| Gauge | Number of open file descriptors. |
| Gauge | Resident memory size in bytes. |
| Gauge | Start time of the process since unix epoch in seconds. |
| Gauge | Virtual memory size in bytes. |
| Gauge | Maximum amount of virtual memory available in bytes. |
Go Runtime Information | ||
| Counter | Count of calls made from Go to C by the current process. |
| Counter | Count of completed GC cycles generated by the Go runtime. |
| Counter | Count of completed GC cycles forced by the application. |
| Counter | Count of all completed GC cycles. |
| Summary | A summary of the pause duration of garbage collection cycles. |
| Histogram | Distribution of heap allocations by approximate size. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks. |
| Counter | Cumulative sum of memory allocated to the heap by the application. |
| Counter | Cumulative count of heap allocations triggered by the application. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks. |
| Histogram | istribution of freed heap allocations by approximate size. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks. |
| Counter | Cumulative sum of heap memory freed by the garbage collector. |
| Counter | Cumulative count of heap allocations whose storage was freed by the garbage collector. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks. |
| Gauge | Heap size target for the end of the GC cycle. |
| Gauge | Number of objects, live or unswept, occupying heap memory. |
| Counter | Count of small allocations that are packed together into blocks. These allocations are counted separately from other allocations because each individual allocation is not tracked by the runtime, only their block. Each block is already accounted for in allocs-by-size and frees-by-size. |
| Gauge | GC cycle the last time the GC CPU limiter was enabled. This metric is useful for diagnosing the root cause of an out-of-memory error, because the limiter trades memory for CPU time when the GC’s CPU time gets too high. This is most likely to occur with use of SetMemoryLimit. The first GC cycle is cycle 1, so a value of 0 indicates that it was never enabled. |
| Histogram | Distribution individual GC-related stop-the-world pause latencies. |
| Gauge | The stack size of new goroutines. |
| Gauge | Number of goroutines that currently exist. |
| Gauge | Information about the Go environment. |
| Gauge | Memory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime’s estimate of free address space that is backed by physical memory. |
| Gauge | Memory occupied by live objects and dead objects that have not yet been marked free by the garbage collector. |
| Gauge | Memory that is completely free and has been returned to the underlying system. This metric is the runtime’s estimate of free address space that is still mapped into the process, but is not backed by physical memory. |
| Gauge | Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use. |
| Gauge | Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use. |
| Gauge | Memory that is reserved for heap objects but is not currently used to hold heap objects. |
| Gauge | Memory that is reserved for runtime mcache structures, but not in-use. |
| Gauge | Memory that is occupied by runtime mcache structures that are currently being used. |
| Gauge | Memory that is reserved for runtime mspan structures, but not in-use. |
| Gauge | Memory that is occupied by runtime mspan structures that are currently being used. |
| Gauge | Memory that is reserved for or used to hold runtime metadata. |
| Gauge | Stack memory allocated by the underlying operating system. |
| Gauge | Memory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more. |
| Gauge | Memory that is used by the stack trace hash map used for profiling. |
| Gauge | All memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes. |
| Gauge | Number of seconds since 1970 of last garbage collection. |
| Gauge | The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously. |
| Gauge | Count of live goroutines. |
| Histogram | Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. |
| Gauge | Number of OS threads created. |
Decision / Proxy / Management Service request related | ||
| Histogram | Duration of all HTTP requests by service (decision, proxy, management), status code, method and path. |
| Gauge | All the requests in progress by service (decision, proxy, management) and method |
| Counter | Count all http requests by service (decision, proxy, management), status code, method and path. |
| Histogram | Duration of all requests by tunneled HTTP status code, service (decision) and method, as well as by GRPC method and status code. |
| Gauge | All the requests in progress by service (decision), tunneled HTTP method, as well as by GRPC method. |
| Counter | Count all requests by service (decision), tunneled HTTP status code, service and method, as well as by GRPC method and status code. |
Certificate expiry information | ||
| Gauge | Number of seconds until a certificate used by a particular service (decision, proxy, management), as well as signer expires. Contains certificate identification information as well. |
Metrics endpoint statistics | ||
| Gauge | Current number of scrapes being served. |
| Counter | Total number of scrapes by HTTP status code. |
| Counter | Total number of internal errors encountered by the promhttp metric handler. |
| More details about Prometheus metric types can be found here. |
Runtime Profiling in Heimdall
If enabled, heimdall exposes a /debug/pprof HTTP endpoint on port 10251 (See also Runtime Profiling Configuration) on which runtime profiling data in the profile.proto format (also known as pprof format) can be consumed by APM tools, like Google’s pprof, Grafana Phlare, Pyroscope and many more for visualization purposes. Following information is available:
allocs- A sampling of all past memory allocations.block- Stack traces that led to blocking on synchronization primitives.cmdline- The command line invocation of the current program, with arguments separated by NUL bytes.goroutine- Stack traces of all current goroutines.heap- A sampling of memory allocations of live objects.mutex- Stack traces of holders of contended mutexes.profile- Cpu profile. Profiling lasts for duration specified insecondsparameter, or for 30 seconds if not specifiedsymbol- Looks up the program counters listed in the request, responding with a table mapping program counters to function names.threadcreate- Stack traces that led to the creation of new OS threads.trace- Execution trace in binary form. Tracing lasts for duration specified insecondsparameter, or for 1 second if not specified.
See also the API documentation for the documentation of the actual API.
Last updated on Mar 6, 2023