Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition, please see this Control Loop Operation and Improvements. for more details on the problems and suggested improvements for operational policies to be addressed in Dublin. Some of this work will be fixed in the current Seed Code architecture as the Drools PDP will be able to support both architectures in Dublin.

Work Proposal to Fix Current Architecture Reliability Issues with policy CRUD and deployment

Casablanca stability testing for SDC Service Distribution exposed the following erratic behavior of the PDP engine with respect to policy CRUD:

Jira
serverONAP JIRA
serverId425b2b0a-557c-3c0c-b515-579789cceedb
keyPOLICY-1277

This is a list of work proposal to help fix the reliability of the Policy Engine/API component piece of the Policy Seed Code Architecture for Dublin .required Medium-sized T-Shirt effort:

  • Database layer changes to deal with failures and timeouts
    • The code doesn’t handle DB layer failure well, we need to inspect and enhance the code for all DB interactions.
  • Network layer changes to deal with failures and timeouts
    • Same is true for communication between components.
  • Consistency checking code will be added and invoked at certain intervals and on demand
    • Need to be able to audit that all components are in sync on the fly.
  • Retry logic will be inserted where applicable
    • Retry code is non-existent for some DB and component interactions, need to add.
  • Tier2 alerting will be inserted where applicable
    • Integration from logfiles to internal systems.
  • Health checks will be instituted to monitor component health
    • Health checks are basic, add more checks where possible.
  • Deployment processes will be hardened to ensure environment-specific configurations and properties files are deployed
    • Need post-deployment auditing scripts created and run automatically on deployments.
  • Improve the policy creation process that removes the need to update every time you push a policy and also does not require repackaging of the rules jar as it just inserts facts in to the working memory. It takes a 9 step process down to about 2 steps.
    • Will be removed, covered by ONAP changes.
  • Recovery tools to reduce the outage interval (stretch goal)
    • Tooling to help aid in recovery which is manual today.

...