You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 34 Next »

In progress

We will use vFW use case as the baseline to test this:

Pre-requirement: Instantiate a vFW with closed loop running.

  • Error detection is very fast: less than 1 second
  • Recovery:
    • Kill docker container, it normally takes less than 1 minute to get the system in normal state. (SDNC, APPC will take up 5 minutes)
    • Delete the pod, it normally takes much longer to get back specially for SDNC, APPC (up to 15 minutes). 
  • Note: Helm upgrade sometimes messed up the whole system, which will turn the system into un-useable status. However, we think this may not be a normal use case for production env.
Time (EDT)Categories

Sub-Categories

(In Error Mode Component)

Time to Detect Failure and RepairPass?Notes

VNF Onboarding and DistributionSDC< 5 minutesPass

Timing?? 30 minutes.  Using  a script kills those components randomly, and continue onboarding VNFs.

ete-k8s.sh onap healthdist

After kicking off the command; waiting for 1 minutes; killed SDC;

The first one was failed; then we did redistribute, it was success.


SO< 5 minutesPass

After kicking off the command; waiting for 1 minutes; killed SO;

The first one was failed; then we did redistribute, it was success.


A&AI< 5 minutesPass
  1. Killed aai-modelloader; it finished the task in 3:04 minutes
  2. Killed two aai-cassandra pods; it finished the task in ~1 minutes.

SDNC< 8 minutesPass
  1. Run preload using scripts

Delete SDNC pod, it took very very long time to get back, it might because of the network issues. And we got a very "weird" system, SDC gives us the following error:

< 5 minutesPass
  1. Deleted one of the SDNC container: eg. sdnc-0.

2. Run health and preload



VNF InstantiationSDC< 2 secondsPassTested with manually kill the docker container

VID< 1 minutePass
  1. kubectl delete pod dev-vid-6d66f9b8c-9vdlt -n onap # back in 1 minute
  2. kubectl delete pod dev-vid-mariadb-fc95657d9-wqn9s -n onap   # back in 1 minute

SO5 minutesPassso pod restarted as part of hard rebooting 2 k8s VMs out of 9

A&AI20 minutesPass

restarted aai-model-loader, aai-hbase, and aai-sparky-be due to hard rebooting 2 more k8s VMs

probably took extra time due to many other pods restarting at the same time and taking time to converge


SDNC5 minutesPasssdnc pods restarted as part of hard rebooting 2 k8s VMs out of 9

MultiVIM< 5 minutesPassdeleted multicloud pods and verified that new pods that come up can orchestrate VNFs as usual

Closed Loop

(Pre-installed manually)

DCAE



DMaaP



Policy

(Policy documentation: Policy on OOM)

15 minutesPass

Deleted dev-pdp-0. No discernible interruption to closed loop. Pod restarted in 2 minutes.

Deleted dev-drools-0. Closed loop failed immediately. Pod restarted in 2 minutes. Closed loop recovered in 15 minutes.

Deleted dev-pap-5c7995667f-wvrgr. No discernible interruption to closed loop. Pod restarted in 2 minutes.

Deleted dev-policydb-5cddbc96cf-hr4jr. No discernible interruption to closed loop. Pod restarted in 2 minutes.


A&AINever
(observed for > 1 hour)
FailDeleted aai-modelloader. Closed loop failed immediately. Even though aai-modelloader container restarted within a couple of minutes (when restarted on a VM that already has the image), closed loop never recovered.

APPC (3-node cluster)20 minutesPass

Deleted dev-appc-0. Closed loop failed immediately. dev-appc-0 pod restarted in 15 minutes. Closed loop recovered in 20 minutes.


Requirement

Area

Priority

Min. Level

Stretch Goal

Level Descriptions (Abbreviated)

Resiliency

High

Level 2 – run-time projects
Level 1 – remaining projects

Level 3 – run-time projects
Level 2 – remaining projects

•1 – manual failure and recovery (< 30 minutes)
•2 – automated detection and recovery (single site) (<30 minutes)
•3 – automated detection and recovery (geo redundancy)

  • No labels