Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What is Culprit Locator

Culprit Locator (or root-cause finder) is a set of ELK-based dashboards created based on ONAP logs to identify where within a failed flow that a problem originated in the most efficient and effective way. The intention is that this would be able to help the testers and developers to get issues from identification to diagnosis/fix more quickly. This application demonstrates how the traceable logs can be used to enhance troubleshooting efforts for the testers and developers. The current version is based on explicit ERROR logs to quickly locate the problem sources in terms of components or subcomponents or a series of significant log details. 

Features

  • Given a RequestId (or TransactionId), bring all relevant logs across all components for investigation
  • Aggregate and trace down the logs by log level, component/subcomponent, and error message pattern
  • Highlight the meaningful logs by a color coding for the intuitive understanding of the process flows
  • Guide to quickly move to the meaningful logs around the issues
  • Provide a drilldown link for further investigation of the logs at the subcomponent level

Data Requirements (logstash)

We assume the index pattern "logstash-*" has been created on Kibana either by default or manual (refer to Logging User Guide Dashboard), and the following data fields are available from the latest version of onap-pipeline.conf: 

  • Timestamp (date formatted)
  • RequestId (UUID)
  • componentName (string)
  • subComponent (string)
  • loglevel (one of INFO, DEBUG, WARN, ERROR)
  • detailmessage (which is equivalent to the field message but aggregatable)
  • source (log file path)
  • Logger (logging object)

Create Kibana Scripted Fields

The following two fields need to be manually created by the user against the index pattern "logstash-*" on Kibana scripted field page (Management → Index Patterns → scripted fields):

...

  • Drill Down: URL link to the subcomponent level dashboard (see below for the details) 

       Script 

    • The code below needs to be put into the 'Script' part of the above page.

    • The drill-down time range is pre-set by -60 seconds to +5 seconds; the user could change the values as necessary.

    • The IP address in the URL of the code should be modified for the user's ONAP kibana server. 
    • The dashboard ID (105a83d0- ...  in the code) should match with the one for the imported dashboard "Culprit Locator (Subcomponent Level)". 
Code Block
if (doc['loglevel.keyword'].value == "ERROR") {
  DateTimeFormatter ft = DateTimeFormatter.ofPattern("YYYY-MM-dd HH:mm:ss.SSS");
  Instant from_instant = Instant.ofEpochMilli(doc['Timestamp'].value - 60000);
  Instant to_instant = Instant.ofEpochMilli(doc['Timestamp'].value + 5000);
  ZonedDateTime from_z = from_instant.atZone(ZoneId.of("UTC"));
  ZonedDateTime to_z = to_instant.atZone(ZoneId.of("UTC"));
  String from_ts = from_z.format(ft);
  String to_ts = to_z.format(ft);

  String ReqId = doc['RequestId.keyword'].value;
  String URL = "http://10.147.58.138:5601/app/kibana#/dashboard/105a83d0-215e-11e8-a744-95cec6f8f1a4?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'" + from_ts + "',mode:absolute,to:'" + to_ts + "'))&_a=(description:'',filters:!(),options:(darkTheme:!f),query:(query_string:(analyze_wildcard:!t,query:'RequestId:%20%22" + ReqId + "%22')),timeRestore:!f,viewMode:view)";
  return URL;
}

Import Kibana Objects

Two approaches are provided for importing the Kibana objects (dashboards/visualizations/searches).

...

Below is a sample zoomed-in view of the red box in the previous view:

Feature Improvements

The next-step of Culprit Locator is to automate the identification of root causes with more intelligent features.

...