You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Current »

  1. Manual mount volume
    Persistence: manually add the volume part in deployment, NFS mode
    spec:
          containers:
          - image: hub.baidubce.com/duanshuaixing/tools:v3
            imagePullPolicy: IfNotPresent
            name: test-volume
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /root/
              name: nfs-test
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
          volumes:
          - name: nfs-test
            nfs:
              path: /dockerdata-nfs/test-volume/
              server: 10.0.0.7
  2. Restart the node to check the nfs automount

    Restart node and see whether nfs client auto-mounts nfs or not, if not, you should munually mount it.

    df -Th |grep nfs

    sudo mount $MASTER_IP:/dockerdata-nfs /dockerdata-nfs/

  3. Reinstall One Project 

    1、Delete a module(Take so as an example)
        
        helm delete dev-so --purge
    
    2、 If delete failed, you can manually delete pvc、pv、deployment、configmap、statefulset、job
    
    3、Install a module
        
        cd oom/kubernests
        make so
        meke onap
        helm install local/so --namespace onap --name dev-so
        
         or(under the circumstance that use docker proxy repository)
         helm install local/so --namespace onap --name dev-so --set global.repository=172.30.1.66:10001
         
        Use a proxy repository when installing a module or define a mirror policy for a module
        helm install local/so --namespace onap --name dev-so --set global.repository=172.30.1.66:10001 --set so.pullPolicy=IfNotPresent
         
        
    4、Clear /dockerdata-nfs/dev-so file( can mv to /bak directory)
  4. Helm hasn't deploy parameter 

    helm has no deploy parameter problem

    cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm/

  5. Helm list show no release

    cp /root/oom/kubernetes/onap/values.yaml /root/integration-override.yaml

    helm deploy dev local/onap -f /root/oom/kubernetes/onap/resources/environments/public-cloud.yaml -f /root/integration-override.yaml --namespace onap --verbose

  6. Forced to delete all pods 
     $(kubectl get pod -n onap |awk '{print $1}') -n onap --grace-period=0 --force
  7. Copy file to pod

    Copy from local to pod, problem about specifying the path

    This can be temporarily resolved by installing the LRZSZ command, or by executing the docker cp command within the node

  8. Check the port exposed by the pod

    1. Check the node where pod belongs to
    
           kubectl get pod -n onap -o wide|grep uui-server
    
     2、Check the type of pod controller (ReplicaSet corresponds to deployment, statefulset corresponds to statefulset)
    
            kubectl -n onap describe pod dev-uui-uui-server-67fc49b6d9-szr7t|grep Controlled
    
     3、Check the deployment corresponding to pod
    
            kubectl get svc -n onap |grep uui-server
    
    4、 Access pod according to  floating ip where the pod is located and 30000+ port   
  9. Check pod through the port 

    Manual mount volume
    1、Check the corresponding service according to the exposed port(take 30399 port as an example)
    
           kubectl -n onap get svc |grep 30399
    
     2、Check the backend pod ip corresponding with this service
    
            kubectl get ep uui-server -n onap
    
     3、Check the corresponding pod and node through pod ip
    
          kubectl get pod -n onap -o wide|grep 10.42.67.201
    
     4、cat /etc/hosts |grep node4 
  10. Can't start ansible-serveransible problem is caused by the unanalysis of dns, you can solve the problem by deploy configmap.
    kubectl replace -f kube-dns-configmap.yaml

  11. Close the health check to avoid restarting

    Delete or comment the following code in deployment or statefulset , it will restart pod after the operation.

    Manual mount volume
    livenessProbe:
              failureThreshold: 3
              httpGet:
                path: /manage/health
                port: 8084
                scheme: HTTP
              initialDelaySeconds: 600
              periodSeconds: 60
              successThreshold: 1
              timeoutSeconds: 10
    
    readinessProbe:
              failureThreshold: 3
              initialDelaySeconds: 10
              periodSeconds: 10
              successThreshold: 1
              tcpSocket:
                port: 8482
              timeoutSeconds: 1
  12. Restart the container in node to check if the new file is missing when the pod health check open/close

    a. In the case that the health check is enabled in the deployment, add a test file to the pod and restart the container in the node.

          Conclusion: After restarts the container in the node, a new container will be created, and the original test file in the pod will be lost.


    b. In the case that the health check is not enabled in the deployment, add the test file in the pod and restart the container in the node.

         Conclusion:  when container restart, stop and start, the data in the pod will not lost

  13. 500 error when SDC distribute package

    try to restart/reinstall dmaap,before you restart or reinstall, you should delete the dev-dmaap file in nfsIf the error still happen , try to restart/reinstall SDC

  14. SDC pod can't start 


    There are dependencies between pods

    The pod that ultimately affects the other pods is dev-sdc-sdc-cs

    If SDC is redeployed, manually remove /dockerdata-nfs/dev-sdc/

  15. Sdnc-dgbuilder pod can't start

    Manual mount volume
    pod state is running state,server does not start
    
    
    # registry module image npm set registry https://registry.npm.taobao.org 
    
    # node-gyp compile-dependented node source image npm set disturl https://npm.taobao.org/dist 
    
    # clean cahce: npm cache clean
    
    ./start.sh sdnc1.0 && wait &
    
    
    pod state is running state,server does not start
    
    
    # registry module image npm set registry https://registry.npm.taobao.org 
    
    # node-gyp compile-dependented node source image npm set disturl https://npm.taobao.org/dist 
    
    # clean cahce npm cache clean
    
    ./start.sh sdnc1.0 && wait &
  16. homles don't install automatically 

    HOMELES doesn't have a auto-deploy problem
    manually deploy holmes
    
    1) enter into dcae bootstrap's pod
    >>kubectl exec -it -n onap dev-dcaegen2-dcae-bootstrap-776cf86d49-mxzq6 /bin/bash
    
    
    2) Delete holmes components
    >>cfy uninstall holmes_rules
    >>cfy deployments delete -f holmes_rules
    >>cfy blueprints delete holmes_rulescfy blueprints validate k8s-holmes-rules.yaml
    
    
    >>cfy uninstall holmes_engine
    >>cfy deployments delete -f holmes_engine
    >>cfy blueprints delete holmes_engine
    
    
    3) Restall holmes components
    
    
    
    >>cfy blueprints upload -b holmes_rules /blueprints/k8s-holmes-rules.yaml
    >>cfy deployments create -b holmes_rules -i /inputs/k8s-holmes_rules-inputs.yaml holmes_rules
    >>cfy executions start -d holmes_rules install 
    
    
    
    >>cfy blueprints upload -b holmes_engine /blueprints/k8s-holmes-engine.yaml
    >>cfy deployments create -b holmes_engine -i /inputs/k8s-holmes_engine-inputs.yaml holmes_engine
    >>cfy executions start -d holmes_engine install
    
    
    
    If restalling holmes fail, following bugs will occur :
    [root@dev-dcaegen2-dcae-bootstrap-9b6b4fb77-fnsdk blueprints]# cfy deployments create -b holmes_rules -i /inputs/k8s-holmes_rules-inputs.yaml holmes_rules
    Creating new deployment from blueprint holmes_rules...
    Deployment created. The deployment's id is holmes_rules
    [root@dev-dcaegen2-dcae-bootstrap-9b6b4fb77-fnsdk blueprints]# cfy executions start -d holmes_rules install
    Executing workflow install on deployment holmes_rules [timeout=900 seconds]
    2018-11-19 10:34:28.961  CFY <holmes_rules> Starting 'install' workflow execution
    2018-11-19 10:34:29.541  CFY <holmes_rules> [pgaasvm_p1aax2] Creating node
    2018-11-19 10:34:30.550  CFY <holmes_rules> [pgaasvm_p1aax2.create] Sending task 'pgaas.pgaas_plugin.create_database'
    2018-11-19 10:34:30.550  CFY <holmes_rules> [pgaasvm_p1aax2.create] Task started 'pgaas.pgaas_plugin.create_database'
    2018-11-19 10:34:31.232  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: create_database(holmes)
    2018-11-19 10:34:32.237  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Error: [Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
    2018-11-19 10:34:32.237  LOG <holmes_rules> [pgaasvm_p1aax2.create] ERROR: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
    2018-11-19 10:34:33.241  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Error: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
    2018-11-19 10:34:32.237  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Stack: Traceback (most recent call last):
      File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 337, in getclusterinfo
        with open(fn, 'r') as f:
    IOError: [Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
    
    2018-11-19 10:34:33.241  LOG <holmes_rules> [pgaasvm_p1aax2.create] WARNING: Stack: Traceback (most recent call last):
      File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 441, in create_database
        info = dbgetinfo(ctx)
      File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 424, in dbgetinfo
        ret = getclusterinfo(wfqdn, True, '', '', [])
      File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 344, in getclusterinfo
        raiseNonRecoverableError('Cluster must be deployed when using an existing cluster. Check your domain name: fqdn={0}, err={1}'.format(safestr(wfqdn),e))
      File "/opt/mgmtworker/env/plugins/default_tenant/pgaas-1.1.0/lib/python2.7/site-packages/pgaas/pgaas_plugin.py", line 167, in raiseNonRecoverableError
        raise NonRecoverableError(msg)
    NonRecoverableError: Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
    
    2018-11-19 10:34:33.238  CFY <holmes_rules> [pgaasvm_p1aax2.create] Task failed 'pgaas.pgaas_plugin.create_database' -> Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
    2018-11-19 10:34:33.553  CFY <holmes_rules> 'install' workflow execution failed: RuntimeError: Workflow failed: Task failed 'pgaas.pgaas_plugin.create_database' -> Cluster must be deployed when using an existing cluster. Check your domain name: fqdn=dcae-pg-primary.onap, err=[Errno 2] No such file or directory: '/opt/manager/resources/pgaas/dcae-pg-primary.onap'
    Execution of workflow install for deployment holmes_rules failed. [error=Traceback (most recent call last):
      File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/dispatch.py", line 548, in _remote_workflow_child_thread
    
    
    
    Need to execute following steps first:
    
    1) Delete holmes components as the step 2 above
    
    2)Reset the Postgres instance
    a. Uninstall the pg initialization blueprint
    cfy uninstall pgaas_initdb
    cfy deployments delete -f pgaas_initdb
    cfy blueprints delete pgaas_initdb
    
    b.  Reset the password of PG via psql
    kubectl exec -it -n onap dev-dcaegen2-dcae-db-0 /bin/sh
    bash-4.2$ psql
    postgres=# ALTER ROLE "postgres" WITH PASSWORD 'onapdemodb';
    ALTER ROLE
    postgres-# \q
    
    c. Deploy PG initialization blueprint
    cfy blueprints upload -b pgaas_initdb /blueprints/k8s-pgaas-initdb.yaml
    cfy deployments create -b pgaas_initdb -i /inputs/k8s-pgaas-initdb-inputs.yaml pgaas_initdb
    cfy executions start -d pgaas_initdb install
    
    
    3)Restall holmes as step 3 above
    
    
  17. dmaap restart sequenceStart dmaap, zookeeper, Kafka, msg, router in sequence, each interval is 1 minute
  18. dev-consul-consul take up a lot of disk space

    Problem: Node disk alarm

    Troubleshooting: through du-hs * troubleshooting /var/lib/docker/ disk occupancy, the problem is caused by the relatively large disk occupancy under this directory

    /var/lib/docker/aufs/diff/b759b23cb79cff6cecdf0e44f7d9a1fb03db018f0c5c48696edcf7e23e2d045b/home/consul/.kube/http-cache/.diskv-temp/

    By kubectl -n onap get pod -o wide|grep consul , confirm the pod is dev-consul-consul-6d7675f5b5-sxrmq,and reconfirm according to kubectl exec this pod

    Solution:Delete all the document in /home/consul/.kube/http-cache/.diskv-temp/ in pod

    Following next is part of the file in the example

  19. Can't delete statefulset

    If it is kubectl1.8.0,it need to be upgrated to kubecto version 1.9.0 or above

  20. Rollback after image update

    1、Check the update history of deployment
    kubectl rollout history deployment nginx-deployment
    
    2、Rollback to the last version
    
    kubectl rollout undo deployment nginx-deployment
    
    3、Rollback to the specified version
    kubectl rollout undo deployment nginx-deployment --to-revision=2


  21. Update image in oom with the docker-manifest.csv under integration repo

    #!/usr/bin/env bash
    
            cd $HOME
            git clone -b casablanca https://gerrit.onap.org/r/integration
            cp $HOME/integration/version-manifest/src/main/resources/docker-manifest.csv $HOME/oom/
            
            version_new="$HOME/oom/docker-manifest.csv"
            for line in $(tail -n +2 $version_new); do
                    image=$(echo $line | cut -d , -f 1)
                    tag=$(echo $line | cut -s -d , -f 2)
                    perl -p -i -e "s|$image(:.*$\|$)|$image:$tag|g" $(find $HOME/oom/ -name values.yaml)
            done
    
    
  22. Delete ONAP

    1、Delete using helm
    
        helm delete $(helm list|tail -n +2|awk '{print $1}') --purge &
        
     2、 Delete the rest api objects in onap in k8s
      
         kubectl -n onap get deployments|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete deployments
         kubectl -n onap get statefulset|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete statefulset
         kubectl -n onap get jobs|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete jobs
         kubectl -n onap get pvc|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete pvc
         kubectl -n onap get secrets|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete secrets
         kubectl -n onap get configmaps|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete configmaps
         kubectl -n onap get svc|tail -n +2 |awk '{print $1}'|xargs kubectl -n onap delete svc
       
    3、Delete data in nfs
    
    
  23. Missing svc or yaml configuration file

    1、Export from the current environment, take cm in multicloud as an example
    
        kubectl -n onap get cm dev-vfc-vfc-catalog-logging-configmap --export -o yaml >>dev-vfc-vfc-catalog-logging-configmap.yaml
        
    2、kubectl -n onap applay -f dev-vfc-vfc-catalog-logging-configmap
    
    3、Restart pod
    
    The nodeport field specified in the service deployed by the helm is not exported. The reason is being checked.
  24. Calling multiple k8s api objects at one time and occurs a stuck problem

    Problem:Build 60 service at a time,kubectl command stucks during execution

    Reason: java progress occur oom problem in pod in rancher server

    Temporary solution:mulnipulate api objects in batches

    Permanent solution:Modify memory limits in Java parameters


    Specify the Xmx number when installing, the default is 4096M, you can increase it to 8192M.

    docker run -d --restart=unless-stopped -e JAVA_OPTS="-Xmx8192m" -p 8080:8080 --name rancher_server rancher/server:v$RANCHER_VERSION

  25. Filter image version

    Filter the image version of oom in kubenetes (take VFC as an example)

    grep -r -E 'image|Image:' ~/oom/kubernetes/|awk '{print $2}'|grep onap|grep vfc

  26. Service Port configuration

     ports:
      - port: 9090
        protocol: TCP
        targetPort: 80
        nodePort: 32000
    
    
    targetPort is the port where docker provides service
    
    Port is the port that the service in the cluster providing access 
    
    NodePort is to specify the exposed port through the nodePort mode. By default, it is randomly assigned. The port range is 30,000-32767
    
    
  • No labels