Distributed Trace Support - Jaeger, Cloud Trace, Zipkin

Hi: 

We've stumbled through awkward conversations with Support and our Account Team with respect to Distributed Trace and hope we can get some help here.

We have, with some pain, been able to get Distributed Trace fed into a Collector using exporter=JAEGER.  We are logging every transaction's Zipkin B3 Headers to Splunk.  Our External Load Balancer is enabled for Logging.  From my team's Apigee-centric view, we've left the Target Backend enablements to the backend teams  

Given that Distributed Trace is pre-GA, and with the understanding that we can get by with it (whatever it is capable of) for now on an ad-hoc basis to diagnose some issues,  we have the following questions:

  1. What does Distributed Trace actually do at this time?  The documentation suggests Apigee will participate in an existing trace, but we don't find that to be the case.  The Trace Spans sent by Apigee do not include the parent span of the request that our backend receives from Apigee
  2. Is there a difference between what is sent for JAEGER exporter vs. CLOUD_TRACE?
  3. Why does the JAEGER Configuration not provide the ability to inject Headers to the collector callback?  This requires us to stand up an intermediate proxy collector to inject Authorization?
  4. Understanding there are no promises from Product Management, is there any broad-strokes outlook for when Distributed Tracing will no longer be considered pre-GA.  This post suggests it's been sitting in pre-GA for at least 18 months and perhaps much longer.

Thanks,
Andrew

Solved Solved
2 2 168
1 ACCEPTED SOLUTION

I think I need the Product team to engage on this question. As you saw, this feature was launched in preview, a long while ago.  We observed less "traction" and attention on the feature from customers, than we had expected. And so, as you observed, this distributed trace feature has languished.

At the same time, the distributed trace standards were evolving and shifting. Jaeger uses Open Census. But Cloud Distributed Trace doesn't. (hence the different parameters in the policy depending on which option you use).  Since the feature was launched, the industry has evolved to embrace Open Telemetry.

Regarding the trace spans, I understand what you're saying. At the time it was released pre-GA, it worked appropriately. Since that time, ASM and the other supporting tech has evolved, and that may have caused a change in the behavior w.r.t. Spans.  

As for injecting authorization... I need the Product team to engage on this question.

Maybe @gsjurseth might have a perspective. 

 

View solution in original post

2 REPLIES 2

I think I need the Product team to engage on this question. As you saw, this feature was launched in preview, a long while ago.  We observed less "traction" and attention on the feature from customers, than we had expected. And so, as you observed, this distributed trace feature has languished.

At the same time, the distributed trace standards were evolving and shifting. Jaeger uses Open Census. But Cloud Distributed Trace doesn't. (hence the different parameters in the policy depending on which option you use).  Since the feature was launched, the industry has evolved to embrace Open Telemetry.

Regarding the trace spans, I understand what you're saying. At the time it was released pre-GA, it worked appropriately. Since that time, ASM and the other supporting tech has evolved, and that may have caused a change in the behavior w.r.t. Spans.  

As for injecting authorization... I need the Product team to engage on this question.

Maybe @gsjurseth might have a perspective. 

 

Distributed tracing with Apigee is something I've looked into on behalf of a few customers and while I can't directly explain the behavior, I can observe it.

First off, there is some interaction with various trace headers by the components involved.

  • GCP Load Balancers recommend using `traceparent` but also support `x-cloud-trace-context`. If these headers are not provided by the client, they get added by GCP, or perhaps by the Envoy based GLB.
  • Apigee uses `x-b3-traceid` and `x-b3-spanid` plus a few others. If these headers are not provided by the client, they get added by Apigee, perhaps by the Envoy based ingress.

I've been able to correlate an API call from Client --> GCP GLB --> Apigee X --> Cloud Run using GCP Trace Explorer in Cloud Monitoring by setting specific headers.

For example, this call to an Apigee API proxy:

TRACEID=12345678901234567890123456789444
curl https://$HOST/v1/cloud-run-multi-region/internal/v1/hello \
    -H "traceparent: 00-${TRACEID}-1234567890123456-01" \
    -H "x-b3-spanid:1" \
    -H "x-b3-traceid:${TRACEID}"

 Results in these headers being used:

    "traceparent": "00-12345678901234567890123456789444-c18510c9d284eab4-01",
    "x-cloud-trace-context": "12345678901234567890123456789444/13944570280229006004;o=1",
    "x-b3-traceid": "12345678901234567890123456789444, 12345678901234567890123456789444",
    "x-b3-spanid": "f304954ef0c72c9a, b48778de95500d42",
    "x-b3-parentspanid": "0000000000000001",
    "x-b3-sampled": "0, 1",

I can then see the correlated trace in Cloud Trace Explorer

kurtkanaskie_0-1721243844885.png