Beijing Release Resiliency Testing Status

In progress

We will use vFW use case as the baseline to test this:

Pre-requirement: Instantiate a vFW with closed loop running.

Error detection is very fast: less than 1 second
Recovery:
- Kill docker container, it normally takes less than 1 minute to get the system in normal state. (SDNC, APPC will take up 5 minutes)
- Delete the pod, it normally takes much longer to get back specially for SDNC, APPC (up to 15 minutes).
Note: Helm upgrade sometimes messed up the whole system, which will turn the system into un-useable status. However, we think this may not be a normal use case for production env.

Categories	Sub-Categories (In Error Mode Component)	Time to Detect Failure and Repair	Pass?	Notes
VNF Onboarding and Distribution	SDC	< 5 minutes	Pass	Timing?? 30 minutes. Using a script kills those components randomly, and continue onboarding VNFs. ete-k8s.sh onap healthdist After kicking off the command; waiting for 1 minutes; killed SDC; The first one was failed; then we did redistribute, it was success.
	SO	< 5 minutes	Pass	After kicking off the command; waiting for 1 minutes; killed SO; The first one was failed; then we did redistribute, it was success.
	A&AI	< 5 minutes	Pass	Killed aai-modelloader; it finished the task in 3:04 minutes Killed two aai-cassandra pods; it finished the task in ~1 minutes.
	SDNC	< 8 minutes	Pass	Run preload using scripts Delete SDNC pod, it took very very long time to get back, it might because of the network issues. And we got a very "weird" system, SDC gives us the following error:
	SDNC	< 5 minutes	Pass	Deleted one of the SDNC container: eg. sdnc-0. 2. Run health and preload
VNF Instantiation	SDC	< 2 seconds	Pass	Tested with manually kill the docker container
	VID	< 1 minute	Pass	kubectl delete pod dev-vid-6d66f9b8c-9vdlt -n onap //back in 1 minute kubectl delete pod dev-vid-mariadb-fc95657d9-wqn9s -n onap // back in 1 minute
	SO	5 minutes	Pass	so pod restarted as part of hard rebooting 2 k8s VMs out of 9
	A&AI	20 minutes	Pass	restarted aai-model-loader, aai-hbase, and aai-sparky-be due to hard rebooting 2 more k8s VMs probably took extra time due to many other pods restarting at the same time and taking time to converge
	SDNC	5 minutes	Pass	sdnc pods restarted as part of hard rebooting 2 k8s VMs out of 9
	MultiVIM	< 5 minutes	Pass	deleted multicloud pods and verified that new pods that come up can orchestrate VNFs as usual
Closed Loop	DCAE			Pre define manually this closed loop
	DMaaP
	Policy			Policy documentation: Policy on OOM
	A&AI
	APPC

Requirement

Area	Priority	Min. Level	Stretch Goal	Level Descriptions (Abbreviated)
Resiliency	High	Level 2 – run-time projects Level 1 – remaining projects	Level 3 – run-time projects Level 2 – remaining projects	•1 – manual failure and recovery (< 30 minutes) •2 – automated detection and recovery (single site) (<30 minutes) •3 – automated detection and recovery (geo redundancy)

Space shortcuts

Page tree

In progress