- LOG-395Getting issue details... STATUS
1. Upgrade of ELK & Potential Feature Development (AAI search-data-service)
ELK Upgrade
Bath team (in charge of search-data-service, @Colin Burns) is planning to upgrade elasticsearch to 6.1.2 (based on AT&T approved versions).
- Current ELK versions: elasticsearch 2.4, kibana 4.6 (no logstash is being used)
- To create the dashboards with enhanced Kibana features, upgrading to version 5.6 for all ELK stack is desired. (note: Logging project is using 5.5)
- Upgrade from 2.x to 5.x requires "Full Cluster-restart Upgrade".
- search-data-service should reflect this upgrade
- deploy/configure the right versions
- potentially update relevant API methods for the elasticsearch data management.
Specifically for POMBA use, Groundhog could provide:
- automatic deployment of kibana (version 6.1.2) through oom (currently, it is manually installed), configure/install all POMBA dashboards
- if necessary for any audit results parsing, automatic deployment of logstash (version 6.1.2) through oom
Feature Enhancements
- Any change of the validation/violation data being pushed to elasticsearch?
- violationDetails (which would tell the exact discrepancies; see the sample event below in the '?violations') need to be sent/stored by data-router or parsed (using logstash)? Such nested info cannot be used in the kibana visualizations.
- Any other meta-data that would be useful? e.g., who invoked the validation (user, dept)
- "Elapsed time after orchestration" would be useful? to get an idea when the instance content would drift from the intended info if it can happen.
(Note) Below are the sample validation and violation events currently stored in ES that will be the data source for the Kibana dashboards.
2. Dashboard Ideas
The visualizations and dashboards will need to be designed and created according to the current and any potential use cases of the POMBA services - what the users want/need to check, how the system could help improve the whole platform integrity. We want the POMBA reporting to be informative, insightful, and intuitive from the user's perspective.
Challenges
- We have created a few sample validation rules but do not know all the rules to be created by the users in the production. That means most likely we can create and provide some high-level dashboards - for the specific rules and details, we could only provide some sample dashboards to give an idea so that how the end users could create their own dashboards customized for their use cases.
- For the Network Discovery, what specific audits will be executed and what kinds of audit results are expected
Dashboard List
(Note) One dashboard type could need multiple dashboard pages depending on the amount of visualizations.
Dashboard Type | Description (What To Want to See) | Required Information To Show (Visualizations) | |
---|---|---|---|
1 | Overall Audit Monitor | As a general admin, I want to see the whole platform integrity - health status in terms of all validation rules configured |
|
2 | Overall Audit Analysis | What kind of validations mostly executed against which models What kind of violations mostly occur in which components |
|
3 | Individual Audit Analysis | Given a validation job, the user wants to see and quickly recognize all relevant violations detected by POMBA |
|
4 | Violation Analysis for Network Discovery | For the specific use cases of Network Discovery, the user wants to see the audit stats |
|
5 | Violation Summary Report | Provide a list of violation cases for any potential fixes |
|
6 | Cure History (stretch) | For the same validation category (e.g., with the same rule and model id, component set?), the user wants to be ensured that the violation has been fixed and gone now, and wants to get an insight on how much the POMBA helps improve the overall system integrity |
|
Supportable Features
- Where necessary, provide links to switch the dashboards back and forth: e.g., from the violation page to the page displaying its validation info
- Color coding for the critical violations
3. Data Generation based on Audit Use Cases
The user could take some possible approaches to execute the audits and generate audit results:
- Event-Driven Individual Auditing: e.g., post-orchestration audit triggered by system or user for a single service instance
- Combined Audit Now: audit the service types and rules selected with one-click command. This requires to collect and keep a list of service instance info in the entire platform.
- Continuous Scheduled Auditing: automated auditing of "Combined Audit" for the selected set of services (existing and/or new). For some services that need a special care or interest, we could customize different schedule (e.g., more frequent validation).
Configurations
- Audit targets selection: which microservices should be included and cross-checked
- Audit rules selection: which rules should be validated for the target services
- Scheduling parameters: when, which rules will be applied