POMBA Reporting Ideas

LOG-395 - Getting issue details... STATUS

1. Upgrade of ELK & Potential Feature Development (AAI search-data-service)

ELK Upgrade

Bath team (in charge of search-data-service, @Colin Burns) is planning to upgrade elasticsearch to 6.1.2 (based on AT&T approved versions).

Current ELK versions: elasticsearch 2.4, kibana 4.6 (no logstash is being used)
To create the dashboards with enhanced Kibana features, upgrading to version 5.6 for all ELK stack is desired. (note: Logging project is using 5.5)
Upgrade from 2.x to 5.x requires "Full Cluster-restart Upgrade".
search-data-service should reflect this upgrade
- deploy/configure the right versions
- potentially update relevant API methods for the elasticsearch data management.

Specifically for POMBA use, Groundhog could provide:

automatic deployment of kibana (version 6.1.2) through oom (currently, it is manually installed), configure/install all POMBA dashboards
if necessary for any audit results parsing, automatic deployment of logstash (version 6.1.2) through oom

Feature Enhancements (Questions)

Any enhancement (add more fields) is desired for the validation/violation data being pushed to elasticsearch?
violationDetails (which would tell the exact discrepancies; see the sample event below in the '?violations') need to be sent/stored by data-router in separate fields or parsed (using logstash)? Such nested info cannot be used in the kibana visualizations.
We could parse out the components involved with the violations to see the violation stats factored by component
"Elapsed time after orchestration" would be useful? if the result could change for the same audit requests at different times since orchestration
"Audit duration" stats would be useful? time taken for the auditing (from trigger to result)
Any other meta-data that would be useful? e.g., who invoked the validation (user, dept)

(Note) Below are the sample validation and violation events currently stored in ES that will be the data source for the Kibana dashboards.

2. Dashboard Ideas

The visualizations and dashboards will need to be designed and created according to the current and any potential use cases of the POMBA services - what the users want/need to check, how the system could help improve the whole platform integrity. We want the POMBA reporting to be informative, insightful, and intuitive from the user's perspective.

Challenges

We have created a few sample validation rules but do not know all the rules to be created by the users in the production. That means most likely we can create and provide some high-level dashboards - for the specific rules and details, we could only provide some sample dashboards to give an idea so that how the end users could create their own dashboards customized for their use cases.
For the Network Discovery, what specific audits will be executed and what kinds of audit results are expected

Dashboard List

(Note) One dashboard type could need multiple dashboard pages depending on the amount of visualizations.

	Dashboard Type	Description (What To Want to See)	Required Information To Show (Visualizations)
1	Overall Audit Monitor	As a general admin, I want to see the whole platform integrity - health status in terms of all validation rules configured	validation total count for the specified time period violation total count for the specified time period validation count over time (trend) violation count over time (trend) violation count by component involved violation count by validation rule violation count by rule severity audit KPI trend: daily violation metric against total validation count (weighted measure considering the severity level) validation list violation list
2	Overall Audit Analysis	What kind of validations mostly executed against which models What kind of violations mostly occur in which components	validation stats (e.g., validationId or serviceInstanceId count) by modelInvariantId and modelVersionId which models made how many violations of which type violation stats by validation model, rule, etc. provide the views from the perspective of each component
3	Individual Audit Analysis	Given a validation job, the user wants to see and quickly recognize all relevant violations detected by POMBA	validation details and related violation details on the same page (stretch) given the same type of validation job, can we retrieve the historical violation results to give an idea if this is really unusual case or it used to happen?
4	Violation Analysis for Network Discovery	For the specific use cases of Network Discovery, the user wants to see the audit stats	TBD
5	Violation Summary Report	Provide a list of violation cases for any potential fixes	list of up-to-date violations with detail info helping to raise/fix any issues
6	Cure History (stretch)	For the same validation category (e.g., with the same rule and model id, component set?), the user wants to be ensured that the violation has been fixed and gone now, and wants to get an insight on how much the POMBA helps improve the overall system integrity	violation stats over time for the same validation category (e..g, 10 violations reduced to 0 now) KPIs showing the violation resolution trend (e.g., how many violations have been fixed on daily basis)

Supportable Features

Where necessary, provide links to switch the dashboards back and forth: e.g., from the violation page to the page displaying its validation info
Color coding for the critical violations

3. Data Generation

For the development purpose, we need a certain amount of audit results data which consist of various types of validation and violation cases. Hope the data to be reflecting the production reality.

Approach 1: Execute the audits in the IST lab (or production) and bring the audit results.

Script A runs to collect a list of info that will be used as arguments in the audit requests: serviceInstanceId, modelINvariantId, modelVersionId, customerId, serviceType
Script B runs to send audit requests based on the data above: need to distribute the requests over time to be realistic
Manually collect the elasticsearch dump (which will contain all the audit validation/violation events) and import to dev lab

Approach 2: Collect the components info and copy to dev lab

Script X runs to GET all info from each component of interest in the IST or production
Script Y prepares APIs to PUT the info into the components in the dev lab
Run Script A
Run Script B

After that, as necessary, we could manipulate the data to generate many different types of violations

Manually update the data in some components to generate special violation cases

Space shortcuts

Page tree

1. Upgrade of ELK & Potential Feature Development (AAI search-data-service)

2. Dashboard Ideas

3. Data Generation