Overview
After a geo-redundant site has failed entirely and a failover activity has been completed, the original site may be recovered and joined back into the SDN-C deployment using this procedure.
Step 1
Step 2
Procedure
Note |
---|
This is meant for lab systems as there may be inconsistencies, so make sure to check the site roles and health afterwards to ensure that everything is fine. |
In an ONAP lab environment, in order to get both sites back into a Geo Redundant pair of two clusters, Helm upgrade needs to be run on both sides with geoEnabled=true:
Code Block |
---|
theme | RDark |
---|
title | Helm upgrade |
---|
|
helm upgrade --set sdnc.config.geoEnabled=true --recreate-pods dev local/onap --namespace onap |
On the primary Kubernetes master, make the local site active:
Code Block |
---|
theme | RDark |
---|
title | sdnc.makeActive |
---|
|
ubuntu@k8s-master:~/oom/kubernetes/sdnc/resources/geo/bin$ ./sdnc.makeActive dev
Forcing prom site sdnc01 to become active
prom site sdnc01 should now be active |
On the primary Kubernetes master, switch voting to the local site:
Code Block |
---|
theme | RDark |
---|
title | switchVoting.sh |
---|
|
ubuntu@k8s-master:~/oom/kubernetes/sdnc/resources/geo/bin$ ./switchVoting.sh primary
success
ubuntu@k8s-master:~/oom/kubernetes/sdnc/resources/geo/bin$ |
Troubleshooting
After the upgrade, two known bugs are consistently exposed on the site that previously had the catastrophic failure take-over and they need to be addressed.
Null MUSIC Pointer
Null pointers end up in the replicas table on the MUSIC server, these need to be deleted. The MUSIC server location is in oom/kubernetes/sdnc/charts/prom/values.yaml: Values.config.musicLocatoin
Remove replica information in MUSIC:
Code Block |
---|
theme | RDark |
---|
title | Remove replica data |
---|
|
root@music-1:~# cqlsh $(hostname)
Connected to Test Cluster at music-1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> use prom_test_onap;
cqlsh:prom_test_onap> select * from replicas;
cqlsh:prom_test> delete from replicas where id = 'sdnc01';
cqlsh:prom_test> delete from replicas where id = 'sdnc02'; |
Then delete the PROM pod on the master, it will restart with the new MUSIC reference:
Code Block |
---|
theme | RDark |
---|
title | Delete PROM pod |
---|
|
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/charts/prom$ k8s get pods -l app=prom
NAME READY STATUS RESTARTS AGE
dev-prom-6485f566fb-8c24m 1/1 Running 0 1m
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/charts/prom$ k8s delete po/dev-prom-6485f566fb-8c24m
pod "dev-prom-6485f566fb-8c24m" deleted |
Consul Server Entry
The new Consul Server may still have an entry for the previous instance of the consul pod, just delete the consul server pod:
Code Block |
---|
theme | RDark |
---|
title | Delete Consul server |
---|
|
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/charts/prom$ k8s get pods | grep consul
dev-consul-649df9c986-8xhxz 1/1 Running 1 19m
dev-consul-server-667ffc8b4d-h57np 1/1 Running 0 19m
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/charts/prom$ k8s delete po/dev-consul-server-667ffc8b4d-h57np
pod "dev-consul-server-667ffc8b4d-h57np" deleted |
Step 3