You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

What is Culprit Locator

Culprit Locator (or root-cause finder) is a set of ELK-based dashboards to mine the log information to identify where within a failed flow that a problem originated. The intention is that this would be able to help the testers and developers to get issues from identification to diagnosis/fix more quickly. This application demonstrates how the traceable logs can be used to enhance troubleshooting efforts for the testers and developers. The current version is based on explicit ERROR logs to quickly locate the problem sources in terms of components or subcomponents or a series of significant log details. 

Features

  • Given a RequestId (or TransactionId), bring all related logs across all components for investigation
  • Aggregate and visualize the logs by log level as highlighting the meaningful logs with a color coding
  • List up all significant logs and their details 
  • Provide a drilldown link for further investigation at the subcomponent level, which displays 

Data Requirements (logstash)


Kibana Preparation



Kibana Objects Import



How To Use

1. Open Dashboard "Culprit Locator (Component Level)" and adjust the time picker for the parts you're interested in or choose one like "Last 7 days". The dashboard features:


  • Pie-in-pie chart shows the total log counts of components for each log level 
  • Log tracking visualization across the components over time: each box represents a collection of logs for the time slot highlighting the most severe loglevel using a color scheme (see the legend in the title of chart, e.g., a red box indicates that at least one ERROR log exists in the time slot). On this chart, you can zoom in any specific time frame by mouse dragging, or click any box to drill down the logs in there 
  • Table of ERROR logs and the components involved with them. Each Error record provides a link (displayed as "view") to a separate drill-down dashboard based on the timestamp of the error.
  • Table of all relevant logs at all loglevel (to see that, need to scroll down the dashboard)


2. Drill-down (or more focused) view dashboard automatically opens as a separate tab with time perid set ranging from -60 seconds prior to the error time to +5 seconds after the error occurred. The user can still zoom in or click the boxes. The dashboard features:

  • Log trracking by sub-components over time with the same color scheme with the previous dashboard
  • Table of error message patterns categorizing all diffeerent error messages (to be imporved)
  • Table of all relevant logs at all loglevel, scrollable for investigating the logs for the specified time period

Improvements 





  • No labels