ONAP Deployment FAQ

Manual mount volume

Persistence: manually add the volume part in deployment, NFS mode

spec:
      containers:
      - image: hub.baidubce.com/duanshuaixing/tools:v3
        imagePullPolicy: IfNotPresent
        name: test-volume
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /root/
          name: nfs-test
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: nfs-test
        nfs:
          path: /dockerdata-nfs/test-volume/
          server: 10.0.0.7

Restart the node to check the nfs automount
Restart node and see whether nfs client auto-mounts nfs or not, if not, you should munually mount it.
df -Th |grep nfs
sudo mount $MASTER_IP:/dockerdata-nfs /dockerdata-nfs/

Reinstall One Project

1、Delete a module（Take so as an example）
    
    helm delete dev-so --purge

2、 If delete failed, you can manually delete pvc、pv、deployment、configmap、statefulset、job

3、Install a module
    
    cd oom/kubernests
    make so
    meke onap
    helm install local/so --namespace onap --name dev-so
    
     or(under the circumstance that use docker proxy repository)
     helm install local/so --namespace onap --name dev-so --set global.repository=172.30.1.66:10001
     
    Use a proxy repository when installing a module or define a mirror policy for a module
    helm install local/so --namespace onap --name dev-so --set global.repository=172.30.1.66:10001 --set so.pullPolicy=IfNotPresent
     
    
4、Clear /dockerdata-nfs/dev-so file( can mv to /bak directory)

Helm hasn't deploy parameter
helm has no deploy parameter problem
cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm/
Helm list show no release
cp /root/oom/kubernetes/onap/values.yaml /root/integration-override.yaml
helm deploy dev local/onap -f /root/oom/kubernetes/onap/resources/environments/public-cloud.yaml -f /root/integration-override.yaml --namespace onap --verbose
Forced to delete all pods
$(kubectl get pod -n onap |awk '{print $1}') -n onap --grace-period=0 --force
Copy file to pod
Copy from local to pod, problem about specifying the path
This can be temporarily resolved by installing the LRZSZ command, or by executing the docker cp command within the node

Check the port exposed by the pod

1. Check the node where pod belongs to

       kubectl get pod -n onap -o wide|grep uui-server

 2、Check the type of pod controller (ReplicaSet corresponds to deployment, statefulset corresponds to statefulset)

        kubectl -n onap describe pod dev-uui-uui-server-67fc49b6d9-szr7t|grep Controlled

 3、Check the deployment corresponding to pod

        kubectl get svc -n onap |grep uui-server

4、 Access pod according to  floating ip where the pod is located and 30000+ port

Check pod through the port

Manual mount volume

1、Check the corresponding service according to the exposed port(take 30399 port as an example)

       kubectl -n onap get svc |grep 30399

 2、Check the backend pod ip corresponding with this service

        kubectl get ep uui-server -n onap

 3、Check the corresponding pod and node through pod ip

      kubectl get pod -n onap -o wide|grep 10.42.67.201

 4、cat /etc/hosts |grep node4

Can't start ansible-serveransible problem is caused by the unanalysis of dns, you can solve the problem by deploy configmap.
kubectl replace -f kube-dns-configmap.yaml

Close the health check to avoid restarting

Delete or comment the following code in deployment or statefulset , it will restart pod after the operation.

Manual mount volume

livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /manage/health
            port: 8084
            scheme: HTTP
          initialDelaySeconds: 600
          periodSeconds: 60
          successThreshold: 1
          timeoutSeconds: 10

readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 8482
          timeoutSeconds: 1

Restart the container in node to check if the new file is missing when the pod health check open/close

a. In the case that the health check is enabled in the deployment, add a test file to the pod and restart the container in the node.
Conclusion: After restarts the container in the node, a new container will be created, and the original test file in the pod will be lost.

b. In the case that the health check is not enabled in the deployment, add the test file in the pod and restart the container in the node.
Conclusion: when container restart, stop and start, the data in the pod will not lost
500 error when SDC distribute package
try to restart/reinstall dmaap，before you restart or reinstall, you should delete the dev-dmaap file in nfsIf the error still happen , try to restart/reinstall SDC
SDC pod can't start

There are dependencies between pods
The pod that ultimately affects the other pods is dev-sdc-sdc-cs
If SDC is redeployed, manually remove /dockerdata-nfs/dev-sdc/

Sdnc-dgbuilder pod can't start

Manual mount volume

pod state is running state，server does not start


# registry module image npm set registry https://registry.npm.taobao.org 

# node-gyp compile-dependented node source image npm set disturl https://npm.taobao.org/dist 

# clean cahce: npm cache clean

./start.sh sdnc1.0 && wait &


pod state is running state，server does not start


# registry module image npm set registry https://registry.npm.taobao.org 

# node-gyp compile-dependented node source image npm set disturl https://npm.taobao.org/dist 

# clean cahce npm cache clean

./start.sh sdnc1.0 && wait &

homles don't install automatically

HOMELES doesn't have a auto-deploy problem
manually deploy holmes

1) enter into dcae bootstrap's pod
>>kubectl exec -it -n onap dev-dcaegen2-dcae-bootstrap-776cf86d49-mxzq6 /bin/bash


2) Delete holmes components
>>cfy uninstall holmes_rules
>>cfy deployments delete -f holmes_rules
>>cfy blueprints delete holmes_rulescfy blueprints validate k8s-holmes-rules.yaml


>>cfy uninstall holmes_engine
>>cfy deployments delete -f holmes_engine
>>cfy blueprints delete holmes_engine


3) Restall holmes components



>>cfy blueprints upload -b holmes_rules /blueprints/k8s-holmes-rules.yaml
>>cfy deployments create -b holmes_rules -i /inputs/k8s-holmes_rules-inputs.yaml holmes_rules
>>cfy executions start -d holmes_rules install 



>>cfy blueprints upload -b holmes_engine /blueprints/k8s-holmes-engine.yaml
>>cfy deployments create -b holmes_engine -i /inputs/k8s-holmes_engine-inputs.yaml holmes_engine
>>cfy executions start -d holmes_engine install



If restalling holmes fail, following bugs will occur ：
[root@dev-dcaegen2-dcae-bootstrap-9b6b4fb77-fnsdk blueprints]# cfy deployments create -b holmes_rules -i /inputs/k8s-holmes_rules-inputs.yaml holmes_rules
Creating new deployment from blueprint holmes_rules...
Deployment created. The deployment's id is holmes_rules
[root@dev-dcaegen2-dcae-bootstrap-9b6b4fb77-fnsdk blueprints]# cfy executions start -d holmes_rules install
Executing workflow install on deployment holmes_rules [timeout=900 seconds]
2018-11-19 10:34:28.961  CFY <holmes_rules> Starting 'install' workflow execution
2018-11-19 10:34:29.541  CFY <holmes_rules> [pgaasvm_p1aax2] Creating node
2018-11-19 10:34:30.550  CFY <holmes_rules> [pgaasvm_p1aax2.create] Sending task 'pgaas.pgaas_plugin.create_database'
2018-11-19 10:34:30.550  CFY <holmes_rules> [pgaasvm_p1aax2.create] Task started 'pgaas.pgaas_plugin.create_database'
2018-11-19 10:34:31.232  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: create_database(holmes)
2018-11-19 10:34:32.237  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Error: [Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
2018-11-19 10:34:32.237  LOG <holmes_rules> [pgaasvm_p1aax2.create] ERROR: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
2018-11-19 10:34:33.241  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Error: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
2018-11-19 10:34:32.237  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Stack: Traceback (most recent call last):
  File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 337, in getclusterinfo
    with open(fn, 'r') as f:
IOError: [Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'

2018-11-19 10:34:33.241  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Stack: Traceback (most recent call last):
  File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 441, in create_database
    info = dbgetinfo(ctx)
  File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 424, in dbgetinfo
    ret = getclusterinfo(wfqdn, True, '', '', [])
  File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 344, in getclusterinfo
    raiseNonRecoverableError('Cluster must be deployed when using an existing cluster. Check your domain name: fqdn={0}, err={1}'.format(safestr(wfqdn),e))
  File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 167, in raiseNonRecoverableError
    raise NonRecoverableError(msg)
NonRecoverableError: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'

2018-11-19 10:34:33.238  CFY <holmes_rules> [pgaasvm_p1aax2.create] Task failed 'pgaas.pgaas_plugin.create_database' -> Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
2018-11-19 10:34:33.553  CFY <holmes_rules> 'install' workflow execution failed: RuntimeError: Workflow failed: Task failed 'pgaas.pgaas_plugin.create_database' -> Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
Execution of workflow install for deployment holmes_rules failed. [error=Traceback (most recent call last):
  File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/dispatch.py", line 548, in _remote_workflow_child_thread



Need to execute following steps first：

1） Delete holmes components as the step 2 above

2）Reset the Postgres instance
a. Uninstall the pg initialization blueprint
cfy uninstall pgaas_initdb
cfy deployments delete -f pgaas_initdb
cfy blueprints delete pgaas_initdb

b.  Reset the password of PG via psql
kubectl exec -it -n onap dev-dcaegen2-dcae-db-0 /bin/sh
bash-4.2$ psql
postgres=# ALTER ROLE "postgres" WITH PASSWORD 'onapdemodb';
ALTER ROLE
postgres-# \q

c. Deploy PG initialization blueprint
cfy blueprints upload -b pgaas_initdb /blueprints/k8s-pgaas-initdb.yaml
cfy deployments create -b pgaas_initdb -i /inputs/k8s-pgaas-initdb-inputs.yaml pgaas_initdb
cfy executions start -d pgaas_initdb install


3）Restall holmes as step 3 above

dmaap restart sequenceStart dmaap, zookeeper, Kafka, msg, router in sequence, each interval is 1 minute
dev-consul-consul take up a lot of disk space
Problem: Node disk alarm
Troubleshooting: through du-hs * troubleshooting /var/lib/docker/ disk occupancy, the problem is caused by the relatively large disk occupancy under this directory

/var/lib/docker/aufs/diff/b759b23cb79cff6cecdf0e44f7d9a1fb03db018f0c5c48696edcf7e23e2d045b/home/consul/.kube/http-cache/.diskv-temp/

By kubectl -n onap get pod -o wide|grep consul , confirm the pod is dev-consul-consul-6d7675f5b5-sxrmq，and reconfirm according to kubectl exec this pod

Solution：Delete all the document in /home/consul/.kube/http-cache/.diskv-temp/ in pod

Following next is part of the file in the example
Can't delete statefulset
If it is kubectl1.8.0，it need to be upgrated to kubecto version 1.9.0 or above

Rollback after image update

1、Check the update history of deployment
kubectl rollout history deployment nginx-deployment

2、Rollback to the last version

kubectl rollout undo deployment nginx-deployment

3、Rollback to the specified version
kubectl rollout undo deployment nginx-deployment --to-revision=2

Update image in oom with the docker-manifest.csv under integration repo

#!/usr/bin/env bash

        cd $HOME
        git clone -b casablanca https://gerrit.onap.org/r/integration
        cp $HOME/integration/version-manifest/src/main/resources/docker-manifest.csv $HOME/oom/
        
        version_new="$HOME/oom/docker-manifest.csv"
        for line in $(tail -n +2 $version_new); do
                image=$(echo $line | cut -d , -f 1)
                tag=$(echo $line | cut -s -d , -f 2)
                perl -p -i -e "s|$image(:.*$\|$)|$image:$tag|g" $(find $HOME/oom/ -name values.yaml)
        done

Delete ONAP

1、Delete using helm

    helm delete $(helm list|tail -n +2|awk '{print $1}') --purge &
    
 2、 Delete the rest api objects in onap in k8s
  
     kubectl -n onap get deployments|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete deployments
     kubectl -n onap get statefulset|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete statefulset
     kubectl -n onap get jobs|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete jobs
     kubectl -n onap get pvc|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete pvc
     kubectl -n onap get secrets|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete secrets
     kubectl -n onap get configmaps|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete configmaps
     kubectl -n onap get svc|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete svc
   
3、Delete data in nfs

Missing svc or yaml configuration file

1、Export from the current environment, take cm in multicloud as an example

    kubectl -n onap get cm dev-vfc-vfc-catalog-logging-configmap --export -o yaml >>dev-vfc-vfc-catalog-logging-configmap.yaml
    
2、kubectl -n onap applay -f dev-vfc-vfc-catalog-logging-configmap

3、Restart pod

The nodeport field specified in the service deployed by the helm is not exported. The reason is being checked.

Calling multiple k8s api objects at one time and occurs a stuck problem
Problem：Build 60 service at a time，kubectl command stucks during execution
Reason： java progress occur oom problem in pod in rancher server

Temporary solution：mulnipulate api objects in batches

Permanent solution：Modify memory limits in Java parameters

Specify the Xmx number when installing, the default is 4096M, you can increase it to 8192M.

docker run -d --restart=unless-stopped -e JAVA_OPTS="-Xmx8192m" -p 8080:8080 --name rancher_server rancher/server:v$RANCHER_VERSION
Filter image version
Filter the image version of oom in kubenetes (take VFC as an example)
grep -r -E 'image|Image:' ~/oom/kubernetes/|awk '{print $2}'|grep onap|grep vfc

Service Port configuration

 ports:
  - port: 9090
    protocol: TCP
    targetPort: 80
    nodePort: 32000


targetPort is the port where docker provides service

Port is the port that the service in the cluster providing access 

NodePort is to specify the exposed port through the nodePort mode. By default, it is randomly assigned. The port range is 30,000-32767

Space shortcuts

Page tree

Manual mount volume

Persistence: manually add the volume part in deployment, NFS mode