CCSDK-4010 - Getting issue details... STATUS

TLDR: Tracing has been added for A1 Policy Management Service. By default is disabled. To enable it change the flag in the application.yaml 

otel:
  sdk:
    disabled: false

or have the environment variables
ONAP_SDK_DISABLED=false
ONAP_TRACING_SOUTHBOUND=true

Tracing Test


application_configuration.json.nosdncdocker-compose.yaml



a) A docker compose with a1pms, a1-osc-simulator, and jaeger that acts as a collector and exporter

version: '3.7'
services:
  a1_policy_management:
    container_name: a1-pms
    image: onap/ccsdk-oran-a1policymanagementservice:1.7.0-SNAPSHOT
    ports:
      - "8433:8433"
      - "8081:8081"
    volumes:
      - ./application_configuration.json.nosdnc:/opt/app/policy-agent/data/application_configuration.json:ro
    networks:
      - jaeger-example
    depends_on:
      - jaeger
    environment:
      - ONAP_SDK_DISABLED=false
      - ONAP_TRACING_SOUTHBOUND=true
      - ONAP_OTEL_SAMPLER_JAEGER_REMOTE_ENDPOINT=http://jaeger:14250
      - ONAP_OTEL_EXPORTER_ENDPOINT=http://jaeger:4317
      - ONAP_OTEL_EXPORTER_PROTOCOL=grpc
      - ONAP_OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc

  a1-sim-OSC:
    image: "nexus3.o-ran-sc.org:10002/o-ran-sc/a1-simulator:2.1.0"
    container_name: a1-sim-OSC
    ports:
      - "30001:8085"
      - "30002:8185"
    environment:
      - A1_VERSION=OSC_2.1.0
      - REMOTE_HOSTS_LOGGING=1
      - ALLOW_HTTP=true
    networks:
      - jaeger-example

  jaeger:
    image: jaegertracing/all-in-one:latest
    container_name: jaeger
    ports:
      - "16686:16686"
      - "14250:14250"
      - "14268:14268"
      - "4317:4317"
      - "4318:4318"
    environment:
      - JAEGER_DISABLED=true
      - LOG_LEVEL=debug
      - COLLECTOR_OTLP_ENABLED=true
    networks:
      - jaeger-example

networks:
  jaeger-example:
    driver: bridge



b) The application_configuration.json.nosdnc in the same folder



{
    "description":"Application configuration",
    "config":{
       "ric":[
          {
             "name":"ric1",
             "baseUrl":"https://a1-sim-OSC:8185/",
             "managedElementIds":[
                "kista_1",
                "kista_2"
             ]
          }
       ]
    }
 }




c) Creating a PolicyType in the simulator

curl -v -X 'PUT' \
   'http://localhost:30001/a1-p/policytypes/1' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
    "name":"pt1",
    "description":"pt1 policy type",
    "policy_type_id":1,
    "create_schema":{
       "$schema":"http://json-schema.org/draft-07/schema#",
       "title":"OSC_Type1_1.0.0",
       "description":"Type 1 policy type",
       "type":"object",
       "properties":{
          "scope":{
             "type":"object",
             "properties":{
                "ueId":{
                   "type":"string"
                },
                "qosId":{
                   "type":"string"
                }
             },
             "additionalProperties":false,
             "required":[
                "ueId",
                "qosId"
             ]
          },
          "qosObjectives":{
             "type":"object",
             "properties":{
                "priorityLevel":{
                   "type":"number"
                }
             },
             "additionalProperties":false,
             "required":[
                "priorityLevel"
             ]
          }
       },
       "additionalProperties":false,
       "required":[
          "scope",
          "qosObjectives"
       ]
    }
}'


d) Creating a policy in a1-pms, after the policy type is successfully registered (curl http://localhost:8081/a1-policy/v2/policy-types)

curl -v -X 'PUT' \
  'http://localhost:8081/a1-policy/v2/policies' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "ric_id": "ric1",
  "policy_id": "aa8feaa88d944d919ef0e83f2172a51002",
  "transient": false,
  "service_id": "controlpanel",
    "policy_data": {
        "scope": {
            "ueId": "ue5100",
            "qosId": "qos5100"
        },
        "qosObjectives": {
            "priorityLevel": 5100.0
        }
    },
  "status_notification_uri": "http://callback-receiver:8090/callbacks/test",
  "policytype_id": "1"
}'


e) http://localhost:16686/ Load Jaeger UI, a1-pms traces, and a sample of the last call would be:


Steps Taken and Challenges:


Adding Telemetry to a1policymanagementservice: The application uses the WebClient from SpringWebflux to contact from the northbound interface a southbound interface (for the latter a A1-OSC simulator has been used).

https://opentelemetry.io/docs/zero-code/java/spring-boot-starter/out-of-the-box-instrumentation/#spring-webflux-autoconfiguration

Opentelemetry documentation provides a bean to mutate the default WebClient builder and to add tracing filters.

In our case the AsyncRestClient manually builds a WebClient for every asynchronous request.

The challenge was to add the tracing filters to this non-Spring class.

1.Adding OpenTelemetry Bean

    @Bean
    public OpenTelemetry openTelemetry() {
        return AutoConfiguredOpenTelemetrySdk.initialize().getOpenTelemetrySdk();
    }

Introduced circular dependency openTelemetryConfig defined in URL [jar:file:/opt/app/policy-agent/a1-policy-management-service.jar!/BOOT-INF/classes!/org/onap/ccsdk/oran/a1policymanagementservice/configuration/OpenTelemetryConfig.class

2. Adding filters into AsyncRestClient directly and not into builder bean, but the AutoConfiguredOpenTelemetrySdk uses by default parameters such as localhost:4317 to export grpc, so we opted for using the application.yaml parameters to build the exporters beans.

AsyncRestClient.java
		...
        OpenTelemetry openTelemetry = AutoConfiguredOpenTelemetrySdk.initialize().getOpenTelemetrySdk();
        var webfluxTelemetry = SpringWebfluxTelemetry.builder(openTelemetry).build();
        return WebClient.builder() //
				...
                .filters(webfluxTelemetry::addClientTracingFilter)
                .build();

3. Context Provider class to use get the ApplicationContext into Non-Spring Components

import org.springframework.beans.BeansException;
import org.springframework.context.ApplicationContext;
import org.springframework.context.ApplicationContextAware;
import org.springframework.stereotype.Component;

@Component
public class ApplicationContextProvider implements ApplicationContextAware {
    private static ApplicationContext context;
 
    @Override
    public void setApplicationContext(ApplicationContext applicationContext) throws BeansException {
        context = applicationContext;
    }
 
    public static ApplicationContext getApplicationContext() {
        return context;
    }
}

And then use var context = ApplicationContextProvider.getApplicationContext().getBean(OtelConfig.class); In the non Spring class, and if tracing is enabled to add the tracing filters.

4. The ApplicationContextProvider class got removed, because it can cause issues on different environment. The class during start up time, in rare cases, was null (if the dependant classes were initialized first). So the approach changed into wrapping the AsyncWebClient build function into a @Service with the Bean SpringWebfluxTelemetry  in @Autowired(required = false) in case the telemetry is disabled an the bean does not start

@Service
@DependsOn({"otelConfig"})
public class WebClientUtil {
    private static OtelConfig otelConfig;
    private static SpringWebfluxTelemetry springWebfluxTelemetry;
    public WebClientUtil(OtelConfig otelConfig, @Autowired(required = false) SpringWebfluxTelemetry springWebfluxTelemetry) {
        WebClientUtil.otelConfig = otelConfig;
        if (otelConfig.isTracingEnabled()) {
            WebClientUtil.springWebfluxTelemetry = springWebfluxTelemetry;
        }
    }


5. Used opentelemetry-springboot-starter, we noticed more information getting traced automatically if we enabled this dependecy.  So we control this dependency in the application yaml under the otel properties.


NOTES:

1.Using the ObservationRegistryCustomizer would still track /actuator manual calls, but it was kept in to kept UnitTests running


    ObservationRegistryCustomizer<ObservationRegistry> skipActuatorEndpointsFromObservation() {
        PathMatcher pathMatcher = new AntPathMatcher("/");
        return registry -> registry.observationConfig().observationPredicate(observationPredicate(pathMatcher));
    }

    static ObservationPredicate observationPredicate(PathMatcher pathMatcher) {
        return (name, context) -> {
            if (context instanceof ServerRequestObservationContext observationContext) {
                return !pathMatcher.match("/actuator/**", observationContext.getCarrier().getRequestURI());
            } else {
                return !SCHEDULED_TASK_NAME.equals(name);
            }
        };
    }


It's worth mentioning that if using the spring-boot auto configuration:

  <dependency>
    <groupId>io.opentelemetry.instrumentation</groupId>
    <artifactId>opentelemetry-spring-boot-starter</artifactId>
  </dependency>

You can follow the below steps:
https://opentelemetry.io/docs/zero-code/java/spring-boot-starter/sdk-configuration/#exclude-actuator-endpoints-from-tracing


2. To retrieve multiple spans, and enable automatic context propagation to ThreadLocals used by FLUX and MONO operators we used:
        Hooks.enableAutomaticContextPropagation(); only if tracing is enabled

https://docs.micrometer.io/context-propagation/reference/index.html 

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>context-propagation</artifactId>
  </dependency>

3.When disabling Telemetry micrometer-tracing-bridge-otel would still try to export spans, so we decided to use one flag to rule them both (micrometer and opentelemetry)

The flag controlling it is

managment
  tracing
    enable: true

Example of polluted logs when disabling only opentelemetry beans:

2024-06-16 18:55:19 2024-06-16 17:55:19.060 [TRACE] [BatchSpanProcessor_WorkerThread-1] i.m.t.o.b.Slf4JBaggageEventListener - Got scope attached event [ScopeAttached{context: [span: null] [baggage: null]}]
2024-06-16 18:55:19 2024-06-16 17:55:19.060 [TRACE] [BatchSpanProcessor_WorkerThread-1] i.m.t.o.b.Slf4JEventListener - Got scope changed event [ScopeAttached{context: [span: null] [baggage: null]}]
2024-06-16 18:55:19 2024-06-16 17:55:19.060 [TRACE] [BatchSpanProcessor_WorkerThread-1] i.m.t.o.b.Slf4JBaggageEventListener - Got scope closed event [io.micrometer.tracing.otel.bridge.EventPublishingContextWrapper$ScopeClosedEvent@56db4345]
2024-06-16 18:55:19 2024-06-16 17:55:19.060 [TRACE] [BatchSpanProcessor_WorkerThread-1] i.m.t.o.b.Slf4JEventListener - Got scope closed event [io.micrometer.tracing.otel.bridge.EventPublishingContextWrapper$ScopeClosedEvent@56db4345]
2024-06-16 18:55:19 2024-06-16 17:55:19.061 [TRACE] [BatchSpanProcessor_WorkerThread-1] i.m.t.o.b.Slf4JBaggageEventListener - Got scope restored event [ScopeRestored{context: [span: null] [baggage: null]}]
2024-06-16 18:55:19 2024-06-16 17:55:19.061 [TRACE] [BatchSpanProcessor_WorkerThread-1] i.m.t.o.b.Slf4JEventListener - Got scope restored event [ScopeRestored{context: [span: null] [baggage: null]}]
2024-06-16 18:55:19 2024-06-16 17:55:19.062 [ERROR] [OkHttp http://localhost:4318/...] i.o.e.i.h.HttpExporter - Failed to export spans. The request could not be executed. Full error message: Failed to connect to localhost/127.0.0.1:4318
2024-06-16 18:55:19 java.net.ConnectException: Failed to connect to localhost/127.0.0.1:4318
2024-06-16 18:55:19     at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:297)
2024-06-16 18:55:19     at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207)
2024-06-16 18:55:19     at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226)
2024-06-16 18:55:19     at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106)
2024-06-16 18:55:19     at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74)
2024-06-16 18:55:19     at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255)
2024-06-16 18:55:19     at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
2024-06-16 18:55:19     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
2024-06-16 18:55:19     at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
2024-06-16 18:55:19     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
2024-06-16 18:55:19     at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
2024-06-16 18:55:19     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
2024-06-16 18:55:19     at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
2024-06-16 18:55:19     at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
2024-06-16 18:55:19     at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
2024-06-16 18:55:19     at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
2024-06-16 18:55:19     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2024-06-16 18:55:19     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2024-06-16 18:55:19     at java.base/java.lang.Thread.run(Thread.java:833)
2024-06-16 18:55:19 Caused by: java.net.ConnectException: Connection refused
2024-06-16 18:55:19     at java.base/sun.nio.ch.Net.pollConnect(Native Method)
2024-06-16 18:55:19     at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
2024-06-16 18:55:19     at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
2024-06-16 18:55:19     at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
2024-06-16 18:55:19     at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
2024-06-16 18:55:19     at java.base/java.net.Socket.connect(Socket.java:633)
2024-06-16 18:55:19     at okhttp3.internal.platform.Platform.connectSocket(Platform.kt:128)
2024-06-16 18:55:19     at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:295)
2024-06-16 18:55:19     ... 18 common frames omitted
2024-06-16 18:55:19 2024-06-16 17:55:19.066 [DEBUG] [BatchSpanProcessor_WorkerThread-1] i.o.s.t.e.BatchSpanProcessor - Exporter failed
  • No labels