Skip to content

Add distributed tracing for event-bus-kafka and Datadog #758

@timmc-edx

Description

@timmc-edx

Datadog does not automatically connect the event bus's producer and consumer traces. If we want this sort of distributed tracing, we'll need to add it ourselves.

Implementation notes

Datadog Support confirms that there is no automatic support for connecting the producer's trace to the spans that come out of the consumer's work. However, we can implement this ourselves if we need it. It's not clear what we get automatically if we enable DD_KAFKA_PROPAGATION_ENABLED and what we get additionally from the custom code they provide as an example (which would be split between edx-django-utils and event-bus-kafka).

Confirming that the functionality difference you've described between NR and DD currently does not exist for us OOTB, and would require some custom code to implement. One of our engineering folks provided this example, using the ddtrace propagator class, and using a manual span to house any post-message processing:

from ddtrace import tracer, config
from ddtrace.propagation.http import HTTPPropagator as Propagator

msg = consumer.poll()

ctx = None
if msg is not None and msg.headers():
    # Extract the distributed context from message headers
    ctx = Propagator.extract(dict(msg.headers()))
with tracer.start_span(
    name="kafka-message-processing", # or whatever name they want from the manual span
    service="their service name", # match their main service name
    child_of=ctx if ctx is not None else tracer.context_provider.active(),
    activate=True
):
    # do any db or other operations that you want included in the distributed context
    db.execute()

One important note here: You'll want to ensure for both producer and consumer services, the following environment variable has been set: DD_KAFKA_PROPAGATION_ENABLED=true. Using this, the trace should include both producer and consumer spans as well as later operation spans.

(It would probably be more appropriate for us to use Span Links but those are only available via the OpenTelemetry integration.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions