Casablanca

Support for failover in catastrophic situations was first available in Casablanca.

Overview

After a geo-redundant site has failed entirely and a failover activity has been completed, the original site may be recovered and joined back into the SDN-C deployment using this procedure.

Procedure

This is meant for lab systems as there may be inconsistencies, so make sure to check the site roles and health afterwards to ensure that everything is fine.


In an ONAP lab environment, in order to get both sites back into a Geo Redundant pair of two clusters, Helm upgrade needs to be run on both sides with geoEnabled=true:

Helm upgrade
helm upgrade --set sdnc.config.geoEnabled=true --recreate-pods dev local/onap --namespace onap


On the primary Kubernetes master, make the local site active:

sdnc.makeActive
ubuntu@k8s-master:~/oom/kubernetes/sdnc/resources/geo/bin$ ./sdnc.makeActive dev
Forcing prom site  sdnc01  to become active
prom site sdnc01  should now be active


On the primary Kubernetes master, switch voting to the local site:

switchVoting.sh
ubuntu@k8s-master:~/oom/kubernetes/sdnc/resources/geo/bin$ ./switchVoting.sh primary
success
ubuntu@k8s-master:~/oom/kubernetes/sdnc/resources/geo/bin$


Troubleshooting

After the upgrade, there may be issues that need to be manually resolved on the site that suffered the catastrophic failure.

Null MUSIC Pointer

Null pointers may end up in the replicas table in the MUSIC cluster. If this occurs, they should be deleted.

Remove replica information from the MUSIC database:

Remove replica data
root@music-1:~# cqlsh $(hostname)
Connected to Test Cluster at music-1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
 
cqlsh> use prom_test_onap;
cqlsh:prom_test_onap> select * from replicas;
cqlsh:prom_test> delete from replicas where id = 'sdnc01';
cqlsh:prom_test> delete from replicas where id = 'sdnc02';

Note: The MUSIC server location can be found in oom/kubernetes/sdnc/charts/prom/values.yaml: Values.config.musicLocation.


Then delete the PROM pod, which will result in Kubernetes recreating it:

Delete PROM pod
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/charts/prom$ k8s get pods -l app=prom
NAME                        READY     STATUS        RESTARTS   AGE
dev-prom-6485f566fb-8c24m   1/1       Running       0          1m
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/charts/prom$ k8s delete po/dev-prom-6485f566fb-8c24m
pod "dev-prom-6485f566fb-8c24m" deleted

Consul Server Entry

The new Consul Server may still have an entry for the previous instance of the consul pod.

Delete the Consul pod, which will result in Kubernetes recreating it:


Delete Consul server
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/charts/prom$ k8s get pods | grep consul
dev-consul-649df9c986-8xhxz                 1/1       Running            1          19m
dev-consul-server-667ffc8b4d-h57np          1/1       Running            0          19m
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/charts/prom$ k8s delete po/dev-consul-server-667ffc8b4d-h57np
pod "dev-consul-server-667ffc8b4d-h57np" deleted
  • No labels