Datadog exporter (via OTLP)

Configure the Datadog exporter for tracing

Enable and configure the Datadog exporter for tracing in the GraphOS Router or Apollo Router Core.

For general tracing configuration, refer to Router Tracing Configuration.

OTLP configuration

OpenTelemetry protocol (OTLP) is the recommended protocol for transmitting telemetry, including traces, to Datadog.

To setup traces to Datadog via OTLP, you must do the following:

Modify the default configuration of the Datadog Agent to accept OTLP traces from the router.
Configure the router to send traces to the configured Datadog Agent.

Datadog Agent configuration

To configure the Datadog Agent, add OTLP configuration to your datadog.yaml. For example:

datadog.yaml

1
otlp_config:
2
  receiver:
3
    protocols:
4
      grpc:
5
        endpoint: <dd-agent-ip>:4317

For additional Datadog Agent configuration details, review Datadog's Enabling OTLP Ingestion on the Datadog Agent documentation.

Router configuration

To configure the router, enable the OTLP exporter and set endpoint: <datadog-agent-endpoint>. For example:

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      common:
5
        # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6
        preview_datadog_agent_sampling: true
7
        sampler: 0.1
8

9
      otlp:
10
        enabled: true
11
        # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317)
12
        endpoint: "${env.DATADOG_AGENT_HOST}:4317"
13

14
        # Optional batch processor setting, this will enable the batch processor to send concurrent requests in a high load scenario.
15
        batch_processor:
16
          max_concurrent_exports: 100

Adjusting the sampler controls the sampling decisions that the router makes on its own and decreases the rate at which you sample. Your sample rate can have a direct impact on your Datadog bill.

ⓘ NOTE

If you see warning messages from the router regarding the batch span processor, you may need to adjust your batch_processor settings in your exporter config to match the volume of spans being created in a router instance. This applies to both OTLP and the Datadog native exporters.

Enabling Datadog Agent sampling

The Datadog APM view relies on traces to generate metrics. For these metrics to be accurate, all requests must be sampled and sent to the Datadog agent. To prevent all traces from being sent to Datadog, in your router you must set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      common:
5
        # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6
        sampler: 0.1
7
        preview_datadog_agent_sampling: true

ⓘ NOTE

- The router doesn't support [`in-agent` ingestion control](https://docs.datadoghq.com/tracing/trace_pipeline/ingestion_mechanisms/?tab=java#in-the-agent). - Configuring `traces_per_second` in the Datadog Agent will not dynamically adjust the router's sampling rate to meet the target rate.

Using preview_datadog_agent_sampling will send all spans to the Datadog Agent. This will have an impact on the resource usage and performance of both the router and Datadog Agent.

Enabling log correlation

To enable Datadog log correlation, you must configure dd.trace_id to appear on the router span:

router.yaml

1
telemetry:
2
  instrumentation:
3
    spans:
4
      mode: spec_compliant
5
      router:
6
        attributes:
7
          dd.trace_id: true

Your JSON formatted log messages will automatically output dd.trace_id on each log message if dd.trace_id was detected on the router span.

Datadog native configuration

⚠️ CAUTION

Native Datadog tracing is not part of the OpenTelemetry spec, and given that Datadog supports OTLP we will be deprecating native Datadog tracing in the future. Use OTLP configuration instead.

The router can be configured to connect to either the native, default Datadog agent address or a URL:

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      common:
5
        # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6
        preview_datadog_agent_sampling: true
7
        sampler: 0.1
8

9
      datadog:
10
        enabled: true
11
        # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:8126)
12
        endpoint: "http://${env.DATADOG_AGENT_HOST}:8126"
13

14
        # Optional batch processor setting, this will enable the batch processor to send concurrent requests in a high load scenario.
15
        batch_processor:
16
          max_concurrent_exports: 100
17

18
  # Enable graphql.operation.name attribute on supergraph spans.
19
  instrumentation:
20
    spans:
21
      mode: spec_compliant
22
      supergraph:
23
        attributes:
24
          graphql.operation.name: true

ⓘ NOTE

Depending on the volume of spans being created in a router instance, it will be necessary to adjust the batch_processor settings in your exporter config. This applies to both OTLP and the Datadog native exporter.

`enabled`

Set to true to enable the Datadog exporter. Defaults to false.

`enable_span_mapping` (default: `true`)

There are some incompatibilities between Datadog and OpenTelemetry, the Datadog exporter might not provide meaningful contextual information in the exported spans. To fix this, you can configure the router to perform a mapping for the span name and the span resource name.

router.yaml

1
telemetry:
2
  exporters:
3
     tracing:
4
       datadog:
5
         enabled: true
6
         enable_span_mapping: true

With enable_span_mapping: true, the router performs the following mapping:

Use the OpenTelemetry span name to set the Datadog span operation name.
Use the OpenTelemetry span attributes to set the Datadog span resource name.

Example trace

For example, assume a client sends a query MyQuery to the router. The router's query planner sends a subgraph query to my-subgraph-name and creates the following trace:

1
| apollo_router request                                                                 |
2
        | apollo_router router                                                              |
3
            | apollo_router supergraph                                                      |
4
            | apollo_router query_planning  | apollo_router execution                       |
5
                                                | apollo_router fetch                       |
6
                                                    | apollo_router subgraph                |
7
                                                        | apollo_router subgraph_request    |

As you can see, there is no clear information about the name of the query, the name of the subgraph, and the name of the query sent to the subgraph.

Instead, when enable_span_mapping is set to true the following trace will be created:

1
| request /graphql                                                                                   |
2
        | router /graphql                                                                                         |
3
            | supergraph MyQuery                                                                         |
4
                | query_planning MyQuery  | execution                                                    |
5
                                              | fetch fetch                                              |
6
                                                  | subgraph my-subgraph-name                            |
7
                                                      | subgraph_request MyQuery__my-subgraph-name__0    |

`fixed_span_names` (default: `true`)

When fixed_span_names: true, the apollo router to use the original span names instead of the dynamic ones as described by OTel semantic conventions.

router.yaml

1
telemetry:
2
  exporters:
3
     tracing:
4
       datadog:
5
         enabled: true
6
         fixed_span_names: true

This will allow you to have a finite list of operation names in Datadog on the APM view.

`resource_mapping`

When set, resource_mapping allows you to specify which attribute to use in the Datadog APM and Trace view. The default resource mappings are:

OpenTelemetry Span Name	Datadog Span Operation Name
`request`	`http.route`
`router`	`http.route`
`supergraph`	`graphql.operation.name`
`query_planning`	`graphql.operation.name`
`subgraph`	`subgraph.name`
`subgraph_request`	`graphql.operation.name`
`http_request`	`http.route`

You may override these mappings by specifying the resource_mapping configuration:

router.yaml

1
telemetry:
2
  exporters:
3
     tracing:
4
       datadog:
5
         enabled: true
6
         resource_mapping:
7
           # Use `my.span.attribute` as the resource name for the `router` span
8
           router: "my.span.attribute"
9
  instrumentation:
10
    spans:
11
      router:
12
        attributes:
13
          # Add a custom attribute to the `router` span
14
          my.span.attribute:
15
            request_header: x-custom-header

If you have introduced a new span in a custom build of the Router you can enable resource mapping for it by adding it to the resource_mapping configuration.

`span_metrics`

When set, span_metrics allows you to specify which spans will show span metrics in the Datadog APM and Trace view. By default, span metrics are enabled for:

request
router
supergraph
subgraph
subgraph_request
http_request
query_planning
execution
query_parsing

You may override these defaults by specifying span_metrics configuration:

The following will disable span metrics for the supergraph span.

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      datadog:
5
        enabled: true
6
        span_metrics:
7
          # Disable span metrics for supergraph
8
          supergraph: false
9
          # Enable span metrics for my_custom_span
10
          my_custom_span: true

If you have introduced a new span in a custom build of the Router you can enable span metrics for it by adding it to the span_metrics configuration.

`batch_processor`

All exporters support configuration of a batch span processor with batch_processor.

You must tune your batch_processor configuration if you see any of the following messages in your logs:

OpenTelemetry trace error occurred: cannot send span to the batch span processor because the channel is full
OpenTelemetry metrics error occurred: cannot send span to the batch span processor because the channel is full

The exact settings depend on the bandwidth available for you to send data to your application peformance monitor (APM) and the bandwidth configuration of your APM. Expect to tune these settings over time as your application changes.

1
telemetry:
2
  exporters:
3
    tracing:
4
      datadog:
5
        batch_processor:
6
          max_export_batch_size: 512
7
          max_concurrent_exports: 1
8
          max_export_timeout: 30s
9
          max_queue_size: 2048
10
          scheduled_delay: 5s

`batch_processor` configuration reference

Attribute	Default	Description
`scheduled_delay`	5s	The delay in seconds from receiving the first span to sending the batch.
`max_concurrent_exports`	1	The maximum number of overlapping export requests.
`max_export_batch_size`	512	The number of spans to include in a batch. May be limited by maximum message size limits.
`max_export_timeout`	30s	The timeout in seconds for sending spans before dropping the data.
`max_queue_size`	2048	The maximum number of spans to be buffered before dropping span data.

Datadog native configuration reference

Attribute	Default	Description
`enabled`	`false`	Enable the OTLP exporter.
`enable_span_mapping`	`false`	If span mapping should be used.
`endpoint`	`http://localhost:8126/v0.4/traces`	The endpoint to send spans to.
`batch_processor`		The batch processor settings.
`resource_mapping`	See config	A map of span names to attribute names.
`span_metrics`	See config	A map of span names to boolean.

Sampler configuration

When using Datadog to gain insight into your router's performance, you need to decide whether to use the Datadog APM view or rely on OTLP metrics. The Datadog APM view is driven by traces. In order for this view to be accurate, all requests must be sampled and sent to the Datadog Agent.

Tracing is expensive both in terms of APM costs and router performance, so you typically will want to set the sampler to sample at low rates in production environments. This, however, impacts the APM view, which will show only a small percentage of traces.

To mitigate this, you can use Datadog Agent sampling mode, where all traces are sent to the Datadog Agent but only a percentage of them are forwarded to Datadog. This keeps the APM view accurate while lowering costs. Note that the router will incur a performance cost of having an effective sample rate of 100%.

Use the following guidelines on how to configure the sampler and preview_datadog_agent_sampling to get the desired behavior:

I want the APM view to show metrics for 100% of traffic, and I am OK with the performance impact on the router.

Set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      common:
5
        # All requests will be traced and sent to the Datadog agent.
6
        # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog.
7
        preview_datadog_agent_sampling: true
8
        sampler: 0.1

I want the Datadog Agent to be in control of the percentage of traces sent to Datadog.

Use the Datadog Agent's probabalistic_sampling option sampler and set the sampler to always_on to allow the agent to control the sampling rate.

Router config:

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      common:
5
        # All requests will be traced and sent to the Datadog agent.
6
        sampler: always_on

Datadog agent config:

1
otlp_config:
2
  traces:
3
    probabilistic_sampling:
4
      # Only 10 percent of spans will be forwarded to Datadog
5
      sampling_percentage: 10

I want the best performance from the router and I'm not concerned with the APM view. I use metrics and traces to monitor my application.

Set the sample to a low value to reduce the number of traces sent to Datadog. Leave preview_datadog_agent_sampling to false.

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      common:
5
        # Only 10 percent of requests will be traced and sent to the Datadog agent. The APM view will only show a subset of total request data but the Router will perform better.
6
        sampler: 0.1
7
        preview_datadog_agent_sampling: false

`sampler` (default: `always_on`)

The sampler configuration allows you to control the sampling decisions that the router will make on its own and decrease the rate at which you sample, which can have a direct impact on your Datadog bill.

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      common:
5
        # Only 10 percent of spans will be forwarded to the Datadog agent. Experiment to find a value that is good for you!
6
        sampler: 0.1

If you are using the Datadog APM viw then you should set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

`preview_datadog_agent_sampling` (default: `false`)

The Datadog APM view relies on traces to generate metrics. For this to be accurate 100% of requests must be sampled and sent to the Datadog agent. To prevent ALL traces from then being sent to Datadog, you must set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

router.yaml

1
telemetry:
2
  exporters:
3
    tracing:
4
      common:
5
        # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
6
        preview_datadog_agent_sampling: true
7
        sampler: 0.1

Using preview_datadog_agent_sampling will send all spans to the Datadog Agent, but only the percentage of traces configured by the sampler will be forwarded to Datadog. This means that your APM view will be accurate, but it will incur performance and resource usage costs for both the router and Datadog Agent to send and receive all spans.

If your use case allows your APM view to show only a subset of traces, then you can set preview_datadog_agent_sampling to false. You should alternatively rely on OTLP metrics to gain insight into the router's performance.

ⓘ NOTE

Configuration

Dynatrace