Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents

Developer Setup

...

Code Block
languagebash
themeMidnight
cd oom/kubernetes
# do a make if anything is modified in your charts
sudo make all
#sudo make onap
ubuntu@ip-172-31-19-23:~/oom/kubernetes$ sudo helm upgrade -i onap local/onap --namespace onap --set log.enabled=false
# wait and check in another terminal for all containers to terminate
ubuntu@ip-172-31-19-23:~$ kubectl get pods --all-namespaces | grep onap-log
onap          onap-log-elasticsearch-7557486bc4-5mng9          0/1       CrashLoopBackOff   9          29m
onap          onap-log-kibana-fc88b6b79-nt7sd                  1/1       Running            0          35m
onap          onap-log-logstash-c5z4d                          1/1       Terminating        0          4h
onap          onap-log-logstash-ftxfz                          1/1       Terminating        0          4h
onap          onap-log-logstash-gl59m                          1/1       Terminating        0          4h
onap          onap-log-logstash-nxsf8                          1/1       Terminating        0          4h
onap          onap-log-logstash-w8q8m                          1/1       Terminating        0          4h
sudo helm upgrade -i onap local/onap --namespace onap --set portal.enabled=false
sudo vi portal/charts/portal-sdk/resources/config/deliveries/properties/ONAPPORTALSDK/logback.xml
sudo make portal
sudo make onap
ubuntu@ip-172-31-19-23:~$ kubectl get pods --all-namespaces | grep onap-log
sudo helm upgrade -i onap local/onap --namespace onap --set log.enabled=true
sudo helm upgrade -i onap local/onap --namespace onap --set portal.enabled=true
ubuntu@ip-172-31-19-23:~$ kubectl get pods --all-namespaces | grep onap-log
onap          onap-log-elasticsearch-7557486bc4-2jd65          0/1       Init:0/1           0          31s
onap          onap-log-kibana-fc88b6b79-5xqg4                  0/1       Init:0/1           0          31s
onap          onap-log-logstash-5vq82                          0/1       Init:0/1           0          31s
onap          onap-log-logstash-gvr9z                          0/1       Init:0/1           0          31s
onap          onap-log-logstash-qqzq5                          0/1       Init:0/1           0          31s
onap          onap-log-logstash-vbp2x                          0/1       Init:0/1           0          31s
onap          onap-log-logstash-wr9rd                          0/1       Init:0/1           0          31s

ubuntu@ip-172-31-19-23:~$ kubectl get pods --all-namespaces | grep onap-portal
onap          onap-portal-app-8486dc7ff8-nbps7                 0/2       Init:0/1           0          9m
onap          onap-portal-cassandra-8588fbd698-4wthv           1/1       Running            0          9m
onap          onap-portal-db-7d6b95cd94-9x4kf                  0/1       Running            0          9m
onap          onap-portal-db-config-dpqkq                      0/2       Init:0/1           0          9m
onap          onap-portal-sdk-77cd558c98-5255r                 0/2       Init:0/1           0          9m
onap          onap-portal-widget-6469f4bc56-g8s62              0/1       Init:0/1           0          9m
onap          onap-portal-zookeeper-5d8c598c4c-czpnz           1/1       Running            0          9m

downgrade docker if required

Code Block
sudo apt-get autoremove -y docker-engine

Change max-pods from default 110 pod limit

Rancher ships with a 110 pod limit - you can override this on the kubernetes template for 1.10

https://lists.onap.org/g/onap-discuss/topic/oom_110_kubernetes_pod/25213556?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,25213556

Manual procedure: change the kubernetes template (1pt2) before using it to create an environment (1a7)

add --max-pods=500 to the "Additional Kubelet Flags" box on the v1.10.13 version of the kubernetes template from the "Manage Environments" dropdown on the left of the 8880 rancher console.

View file
name20180905_rancher_increase_110_pod_limit.mp4
height250

Jira
serverONAP JIRA
serverId425b2b0a-557c-3c0c-b515-579789cceedb
keyOOM-1137

Image Added

Image Added

Or capture the output of the REST PUT call - and add around line 111 of the script https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh#n111

Image Added

Automated - ongoing

Code Block
themeMidnight
ubuntu@ip-172-31-27-183:~$ curl 'http://127.0.0.1:8880/v2-beta/projecttemplates/1pt2' --data-binary '{"id":"1pt2","type":"projectTemplate","baseType":"projectTemplate","name":"Kubernetes","state":"active","accountId":null,"created":"2018-09-05T14:12:24Z","createdTS":1536156744000,"data":{"fields":{"stacks":[{"name":"healthcheck","templateId":"library:infra*healthcheck"},{"answers":{"CONSTRAINT_TYPE":"none","CLOUD_PROVIDER":"rancher","AZURE_CLOUD":"AzurePublicCloud","AZURE_TENANT_ID":"","AZURE_CLIENT_ID":"","AZURE_CLIENT_SECRET":"","AZURE_SEC_GROUP":"","RBAC":false,"REGISTRY":"","BASE_IMAGE_NAMESPACE":"","POD_INFRA_CONTAINER_IMAGE":"rancher/pause-amd64:3.0","HTTP_PROXY":"","NO_PROXY":"rancher.internal,cluster.local,rancher-metadata,rancher-kubernetes-auth,kubernetes,169.254.169.254,169.254.169.250,10.42.0.0/16,10.43.0.0/16","ENABLE_ADDONS":true,"ENABLE_RANCHER_INGRESS_CONTROLLER":true,"RANCHER_LB_SEPARATOR":"rancherlb","DNS_REPLICAS":"1","ADDITIONAL_KUBELET_FLAGS":"","FAIL_ON_SWAP":"false","ADDONS_LOG_VERBOSITY_LEVEL":"2","AUDIT_LOGS":false,"ADMISSION_CONTROLLERS":"NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,ResourceQuota","SERVICE_CLUSTER_CIDR":"10.43.0.0/16","DNS_CLUSTER_IP":"10.43.0.10","KUBEAPI_CLUSTER_IP":"10.43.0.1","KUBERNETES_CIPHER_SUITES":"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305","DASHBOARD_CPU_LIMIT":"100m","DASHBOARD_MEMORY_LIMIT":"300Mi","INFLUXDB_HOST_PATH":"","EMBEDDED_BACKUPS":true,"BACKUP_PERIOD":"15m0s","BACKUP_RETENTION":"24h","ETCD_HEARTBEAT_INTERVAL":"500","ETCD_ELECTION_TIMEOUT":"5000"},"name":"kubernetes","templateVersionId":"library:infra*k8s:47"},{"name":"network-services","templateId":"library:infra*network-services"},{"name":"ipsec","templateId":"library:infra*ipsec"}]}},"description":"Default Kubernetes template","externalId":"catalog://library:project*kubernetes:0","isPublic":true,"kind":"projectTemplate","removeTime":null,"removed":null,"stacks":[{"type":"catalogTemplate","name":"healthcheck","templateId":"library:infra*healthcheck"},{"type":"catalogTemplate","answers":{"CONSTRAINT_TYPE":"none","CLOUD_PROVIDER":"rancher","AZURE_CLOUD":"AzurePublicCloud","AZURE_TENANT_ID":"","AZURE_CLIENT_ID":"","AZURE_CLIENT_SECRET":"","AZURE_SEC_GROUP":"","RBAC":false,"REGISTRY":"","BASE_IMAGE_NAMESPACE":"","POD_INFRA_CONTAINER_IMAGE":"rancher/pause-amd64:3.0","HTTP_PROXY":"","NO_PROXY":"rancher.internal,cluster.local,rancher-metadata,rancher-kubernetes-auth,kubernetes,169.254.169.254,169.254.169.250,10.42.0.0/16,10.43.0.0/16","ENABLE_ADDONS":true,"ENABLE_RANCHER_INGRESS_CONTROLLER":true,"RANCHER_LB_SEPARATOR":"rancherlb","DNS_REPLICAS":"1","ADDITIONAL_KUBELET_FLAGS":"--max-pods=600","FAIL_ON_SWAP":"false","ADDONS_LOG_VERBOSITY_LEVEL":"2","AUDIT_LOGS":false,"ADMISSION_CONTROLLERS":"NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,ResourceQuota","SERVICE_CLUSTER_CIDR":"10.43.0.0/16","DNS_CLUSTER_IP":"10.43.0.10","KUBEAPI_CLUSTER_IP":"10.43.0.1","KUBERNETES_CIPHER_SUITES":"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305","DASHBOARD_CPU_LIMIT":"100m","DASHBOARD_MEMORY_LIMIT":"300Mi","INFLUXDB_HOST_PATH":"","EMBEDDED_BACKUPS":true,"BACKUP_PERIOD":"15m0s","BACKUP_RETENTION":"24h","ETCD_HEARTBEAT_INTERVAL":"500","ETCD_ELECTION_TIMEOUT":"5000"},"name":"kubernetes","templateVersionId":"library:infra*k8s:47"},{"type":"catalogTemplate","name":"network-services","templateId":"library:infra*network-services"},{"type":"catalogTemplate","name":"ipsec","templateId":"library:infra*ipsec"}],"transitioning":"no","transitioningMessage":null,"transitioningProgress":null,"uuid":null}' --compressed

{"id":"9107b9ce-0b61-4c22-bc52-f147babb0ba7","type":"error","links":{},"actions":{},"status":405,"code":"Method not allowed","message":"Method not allowed","detail":null,"baseType":"error"}

Results

Single AWS 244G 32vCore VM with 110 pod limit workaround - 164 pods (including both secondary DCAEGEN2 orchestrations at 30 and 55 min) - most of the remaining 8 container failures are known/in-progress issues.


Code Block
themeMidnight
ubuntu@ip-172-31-20-218:~$ free
              total        used        free      shared  buff/cache   available
Mem:      251754696   111586672    45000724      193628    95167300   137158588
ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | wc -l
164
ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l
155
ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' | wc -l
8
ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2'
onap          dep-dcae-ves-collector-59d4ff58f7-94rpq                 1/2       Running                 0          4m
onap          onap-aai-champ-68ff644d85-rv7tr                         0/1       Running                 0          59m
onap          onap-aai-gizmo-856f86d664-q5pvg                         1/2       CrashLoopBackOff        10         59m
onap          onap-oof-85864d6586-zcsz5                               0/1       ImagePullBackOff        0          59m
onap          onap-pomba-kibana-d76b6dd4c-sfbl6                       0/1       Init:CrashLoopBackOff   8          59m
onap          onap-pomba-networkdiscovery-85d76975b7-mfk92            1/2       CrashLoopBackOff        11         59m
onap          onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9   1/2       CrashLoopBackOff        10         59m
onap          onap-vid-84c88db589-8cpgr                               1/2       CrashLoopBackOff        9          59m




Operations

Get failed/pending containers

Code Block
themeMidnight
linenumberstrue
kubectl get pods --all-namespaces | grep -E "0/|1/2" | wc -l
Code Block
kubectl cluster-info
# get pods/containers
kubectl get pods --all-namespaces
# get port mappings
kubectl get services --all-namespaces -o wide
NAMESPACE     NAME                                   READY     STATUS    RESTARTS   AGE
default       nginx-1389790254-lgkz3                 1/1       Running   1          5d
kube-system   heapster-4285517626-x080g              1/1       Running   1          6d
kube-system   kube-dns-638003847-tst97               3/3       Running   3          6d
kube-system   kubernetes-dashboard-716739405-fnn3g   1/1       Running   2          6d
kube-system   monitoring-grafana-2360823841-hr824    1/1       Running   1          6d
kube-system   monitoring-influxdb-2323019309-k7h1t   1/1       Running   1          6d
kube-system   tiller-deploy-737598192-x9wh5          1/1       Running   1          6d
# ssh into a pod
kubectl -n default exec -it  nginx-1389790254-lgkz3 /bin/bash
# get logs
kubectl -n default logs -f nginx-1389790254-lgkz3

Exec

kubectl -n onap-aai exec -it aai-resources-1039856271-d9bvq  bash

Bounce/Fix a failed container

Periodically one of the higher containers in a dependency tree will not get restarted in time to pick up running child containers - usually this is the kibana container

Fix this or "any" container by deleting the container in question and kubernetes will bring another one up.

Code Block
themeMidnight
root@a-onap-auto-20180412-ref:~# kubectl get services --all-namespaces | grep log
onap          dev-vfc-catalog               ClusterIP   10.43.210.8     <none>        8806/TCP                                                                     5d
onap          log-es                        NodePort    10.43.77.87     <none>        9200:30254/TCP                                                               5d
onap          log-es-tcp                    ClusterIP   10.43.159.93    <none>        9300/TCP                                                                     5d
onap          log-kibana                    NodePort    10.43.41.102    <none>        5601:30253/TCP                                                               5d
onap          log-ls                        NodePort    10.43.180.165   <none>        5044:30255/TCP                                                               5d
onap          log-ls-http                   ClusterIP   10.43.13.180    <none>        9600/TCP                                                                     5d
root@a-onap-auto-20180412-ref:~# kubectl get pods --all-namespaces | grep log
onap          dev-log-elasticsearch-66cdc4f855-wmpkz         1/1       Running                 0          5d
onap          dev-log-kibana-5b6f86bcb4-drpzq                0/1       Running                 1076       5d
onap          dev-log-logstash-6d9fdccdb6-ngq2f              1/1       Running                 0          5d
onap          dev-vfc-catalog-7d89bc8b9d-vxk74               2/2       Running                 0          5d
root@a-onap-auto-20180412-ref:~# kubectl delete pod dev-log-kibana-5b6f86bcb4-drpzq -n onap
pod "dev-log-kibana-5b6f86bcb4-drpzq" deleted
root@a-onap-auto-20180412-ref:~# kubectl get pods --all-namespaces | grep log
onap          dev-log-elasticsearch-66cdc4f855-wmpkz         1/1       Running                 0          5d
onap          dev-log-kibana-5b6f86bcb4-drpzq                0/1       Terminating             1076       5d
onap          dev-log-kibana-5b6f86bcb4-gpn2m                0/1       Pending                 0          12s
onap          dev-log-logstash-6d9fdccdb6-ngq2f              1/1       Running                 0          5d
onap          dev-vfc-catalog-7d89bc8b9d-vxk74               2/2       Running                 0          5d


Remove containers stuck in terminating

a helm namespace delete or a kubectl delete or a helm purge may not remove everything based on hanging PVs - use

Code Block
themeMidnight
#after a 
sudo helm delete --purge onap


melliott [12:11 PM]
kubectl delete pods <pod> --grace-period=0 --force -n onap


Persistent Volumes

Several applications in ONAP require persistent configuration or storage outside of the stateless docker containers managed by Kubernetes.  In this case Kubernetes can act as a direct wrapper of native docker volumes or provide its own extended dynamic persistence for use cases where we are running scaled pods on multiple hosts.

https://kubernetes.io/docs/concepts/storage/persistent-volumes/

The SDNC clustering poc - https://gerrit.onap.org/r/#/c/25467/23

For example the following has a patch that exposes a dir into the container just like a docker volume or a volume in docker-compose - the issue here is mixing emptyDir (exposing dirs between containers) and exposing dirs outside to the FS/NFS

https://jira.onap.org/browse/LOG-52

This is only one way to do a static PV in K8S

https://jira.onap.org/secure/attachment/10436/LOG-50-expose_mso_logs.patch

Token

Thanks Joey

root@ip-172-31-27-86:~# kubectl describe secret $(kubectl get secrets | grep default | cut -f1 -d ' ')

Name:         default-token-w1jq0

Namespace:    default

Labels:       <none>

Annotations:  kubernetes.io/service-account.name=default

              kubernetes.io/service-account.uid=478eae11-f0f4-11e7-b936-022346869a82

Type:  kubernetes.io/service-account-token

Data

====

ca.crt:     1025 bytes

namespace:  7 bytes

token:      eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImRlZmF1bHQtdG9rZW4tdzFqcTAiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGVmYXVsdCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjQ3OGVhZTExLWYwZjQtMTFlNy1iOTM2LTAyMjM0Njg2OWE4MiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpkZWZhdWx0OmRlZmF1bHQifQ.Fjv6hA1Kzurr-Cie5EZmxMOoxm-3Uh3zMGvoA4Xu6h2U1-NBp_fw_YW7nSECnI7ttGz67mxAjknsgfze-1JtgbIUtyPP31Hp1iscaieu5r4gAc_booBdkV8Eb8gia6sF84Ye10lsS4nkmmjKA30BdqH9qjWspChLPdGdG3_RmjApIHEOjCqQSEHGBOMvY98_uO3jiJ_XlJBwLL4uydjhpoANrS0xlS_Evn0evLdits7_piklbc-uqKJBdZ6rWyaRbkaIbwNYYhg7O-CLlUVuExynAAp1J7Mo3qITNV_F7f4l4OIzmEf3XLho4a1KIGb76P1AOvSrXgTzBq0Uvh5fUw


Auto Scaling

Using the example on page 122 of Kubernetes Up & Running.

Code Block
kubectl run nginx --image=nginx:1.7.12
kubectl get deployments nginx
kubectl scale deployments nginx --replicas=3
kubectl get deployments nginx
kubectl get replicasets --selector=run=nginx
kubectl get pods --all-namespaces
kubectl scale deployments nginx --replicas=64


Developer Deployment

Deployment Integrity

...