1
0
-1

Has anyone ever tried to leave ONAP running for more than 24 hours? We've set it up from master branch using rancher approach. Apparently 64GB RAM is NOT enough for full installation as OOMs were happening over nights and kernel was killing processes randomly (including docker daemon), as a result system became highly unstable. We have to restart docker daemon once a day to mitigate the issue, but it's really annoying.


Environment details
OS: Ubuntu 16.04.3 LTS
RAM: 64GB
CPU: 2.4 GHZ 24 cores
DISK: 120GB


Rancher version: v1.6.14


Helm version:
Client: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}


Kubectl version:
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T10:09:24Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.5-rancher1", GitCommit:"6cb179822b9f77893eac5612c91a0ed7c0941b45", GitTreeState:"clean", BuildDate:"2017-12-11T17:40:37Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Docker version
Docker version:
Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 03:35:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 03:35:14 2017
 OS/Arch:      linux/amd64
 Experimental: false
Docker info
Containers: 265
 Running: 221
 Paused: 0
 Stopped: 44
Images: 108
Server Version: 17.03.2-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1441
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-109-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 62.92 GiB
Name: onap
ID: GKSM:L2MG:BRRL:YYFT:YOPK:HENI:HTYS:6LNN:JA3K:67RO:T4YY:FRS5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 1369
 Goroutines: 33357
 System Time: 2018-03-07T13:48:47.174044656Z
 EventsListeners: 2
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support



Docker images
REPOSITORY                                                  TAG                  IMAGE ID            CREATED             SIZE
ubuntu                                                      16.04                f975c5035748        15 hours ago        112 MB
nexus3.onap.org:10001/onap/aaf/authz-service                latest               36b3cc05b71b        24 hours ago        738 MB
quay.io/influxdb/chronograf                                 1.4.2.1              cf2f8b55becf        6 days ago          38.9 MB
nexus3.onap.org:10001/onap/oom/kube2msb                     latest               8f55959d6c1e        6 days ago          41.3 MB
busybox                                                     latest               f6e427c148a7        6 days ago          1.15 MB
mysql                                                       5.7                  5d4d51c57ea8        8 days ago          374 MB
dorowu/ubuntu-desktop-lxde-vnc                              latest               0ddcc9e7a313        10 days ago         1.32 GB
nexus3.onap.org:10001/onap/refrepo/postgres                 latest               699c19a97093        13 days ago         263 MB
nexus3.onap.org:10001/openecomp/testsuite                   1.1-STAGING-latest   8489d82a2d46        2 weeks ago         1.14 GB
nexus3.onap.org:10001/library/mariadb                       10                   e1dd445713ae        2 weeks ago         396 MB
oomk8s/readiness-check                                      1.1.0                c0cce3bbe795        2 weeks ago         585 MB
consul                                                      latest               5f4915f05e27        3 weeks ago         54.2 MB
mysql/mysql-server                                          5.6                  30dc57b553c0        6 weeks ago         226 MB
gcr.io/kubernetes-helm/tiller                               v2.8.0               7257caf71e74        6 weeks ago         71.5 MB
rancher/server                                              v1.6.14              d63b9b4bd205        7 weeks ago         1.08 GB
rancher/agent                                               v1.2.9               34a453d374b9        7 weeks ago         237 MB
rancher/k8s                                                 v1.8.5-rancher4      de62058927b2        7 weeks ago         1.6 GB
wurstmeister/kafka                                          latest               4e09479aca4e        2 months ago        277 MB
rancher/net                                                 v0.13.7              1d3351fae706        2 months ago        310 MB
rancher/network-manager                                     v0.7.19              87e7ab1c6276        2 months ago        256 MB
gcr.io/google_containers/kubernetes-dashboard-amd64         v1.8.0               55dbc28356f2        3 months ago        119 MB
nexus3.onap.org:10001/onap/refrepo                          1.0-STAGING-latest   3912055963a3        3 months ago        1.32 GB
rancher/lb-service-rancher                                  v0.7.17              b7fa6b9cb097        3 months ago        361 MB
nexus3.onap.org:10001/onap/usecase-ui/usecase-ui-server     v1.0.1               0a8b1e563d15        3 months ago        1.09 GB
nexus3.onap.org:10001/openecomp/vid                         v1.1.1               abc2193edc35        3 months ago        763 MB
nexus3.onap.org:10001/onap/usecase-ui                       v1.0.1               ac54269ccde4        3 months ago        552 MB
nexus3.onap.org:10001/onap/multicloud/openstack-windriver   v1.0.0               d3f43f230f01        3 months ago        736 MB
nexus3.onap.org:10001/onap/multicloud/openstack-ocata       v1.0.0               431c73e74365        3 months ago        736 MB
nexus3.onap.org:10001/openecomp/mso                         v1.1.1               7758d7bfc845        3 months ago        1.57 GB
nexus3.onap.org:10001/onap/multicloud/framework             v1.0.0               39e2f113e853        3 months ago        792 MB
nexus3.onap.org:10001/onap/multicloud/vio                   v1.0.0               8ba1b50921a2        3 months ago        744 MB
nexus3.onap.org:10001/onap/sdnc-dmaap-listener-image        v1.2.1               c56afa78d1e8        3 months ago        979 MB
nexus3.onap.org:10001/onap/sdnc-ueb-listener-image          v1.2.1               49cb4c20483e        3 months ago        956 MB
nexus3.onap.org:10001/onap/admportal-sdnc-image             v1.2.1               4bf7f8c2078b        3 months ago        957 MB
nexus3.onap.org:10001/onap/sdnc-image                       v1.2.1               c57cde24538b        3 months ago        1.91 GB
nexus3.onap.org:10001/openecomp/sdc-cassandra               v1.1.0               119363b4876b        3 months ago        874 MB
nexus3.onap.org:10001/openecomp/sdc-kibana                  v1.1.0               5a60bc32b541        3 months ago        487 MB
nexus3.onap.org:10001/openecomp/sdc-elasticsearch           v1.1.0               9c9473412794        3 months ago        541 MB
nexus3.onap.org:10001/openecomp/sdc-frontend                v1.1.0               72064d647e90        3 months ago        824 MB
nexus3.onap.org:10001/openecomp/sdc-backend                 v1.1.0               bd2289ba829d        3 months ago        1.39 GB
nexus3.onap.org:10001/onap/data-router                      v1.1.0               11e2c943f746        3 months ago        603 MB
nexus3.onap.org:10001/onap/vfc/nfvo/svnfm/nokia             v1.0.2               7eac04ba9bec        3 months ago        1.09 GB
nexus3.onap.org:10001/onap/vfc/nslcm                        v1.0.2               9900c9b008b2        3 months ago        643 MB
nexus3.onap.org:10001/openecomp/aai-resources               v1.1.0               95509a04dd08        3 months ago        680 MB
nexus3.onap.org:10001/onap/vfc/nfvo/svnfm/huawei            v1.0.2               7346e6e58147        3 months ago        1.07 GB
nexus3.onap.org:10001/openecomp/aai-traversal               v1.1.0               afd092a498e3        3 months ago        680 MB
nexus3.onap.org:10001/onap/vfc/catalog                      v1.0.2               46fb032c97aa        3 months ago        621 MB
nexus3.onap.org:10001/onap/sparky-be                        v1.1.0               643d74206804        3 months ago        601 MB
nexus3.onap.org:10001/onap/policy/policy-pe                 v1.1.1               4d51b4c183c3        3 months ago        1.55 GB
nexus3.onap.org:10001/onap/policy/policy-drools             v1.1.1               14bfa9aba655        3 months ago        1.07 GB
nexus3.onap.org:10001/onap/policy/policy-db                 v1.1.1               31e0a80a3255        3 months ago        1.17 GB
nexus3.onap.org:10001/onap/policy/policy-nexus              v1.1.1               ba0ff1015384        3 months ago        1.01 GB
nexus3.onap.org:10001/openecomp/appc-image                  v1.2.0               399e222d320b        3 months ago        3.04 GB
nexus3.onap.org:10001/onap/ccsdk-dgbuilder-image            v0.1.0               3e4649f81feb        3 months ago        980 MB
nexus3.onap.org:10001/onap/vfc/jujudriver                   v1.0.0               ff409fbc9027        3 months ago        1.09 GB
nexus3.onap.org:10001/onap/vfc/gvnfmdriver                  v1.0.1               d199a237b317        3 months ago        443 MB
nexus3.onap.org:10001/onap/vfc/vnfmgr                       v1.0.1               3e81fae1fad3        3 months ago        582 MB
nexus3.onap.org:10001/onap/vfc/vnfres                       v1.0.1               2bf57e296a5a        3 months ago        582 MB
nexus3.onap.org:10001/onap/search-data-service              v1.1.0               d148103ce73a        3 months ago        612 MB
nexus3.onap.org:10001/onap/vfc/vnflcm                       v1.0.1               b236d8a5e32e        3 months ago        583 MB
nexus3.onap.org:10001/onap/vfc/resmanagement                v1.0.0               8bf64c29dcc5        3 months ago        1.07 GB
nexus3.onap.org:10001/onap/vfc/wfengine-activiti            v1.0.0               872db550f2f8        3 months ago        144 MB
nexus3.onap.org:10001/onap/vfc/emsdriver                    v1.0.1               e74489cb68c0        3 months ago        495 MB
nexus3.onap.org:10001/onap/model-loader                     v1.1.0               06fb59df5644        3 months ago        603 MB
nexus3.onap.org:10001/onap/vfc/ztesdncdriver                v1.0.0               a167afebb6ef        3 months ago        484 MB
nexus3.onap.org:10001/onap/vfc/wfengine-mgrservice          v1.0.0               f68056246ebf        3 months ago        117 MB
nexus3.onap.org:10001/onap/clamp                            v1.1.0               0442da642684        3 months ago        464 MB
nexus3.onap.org:10001/onap/aai/esr-gui                      v1.0.0               919974f2addb        4 months ago        525 MB
nexus3.onap.org:10001/onap/aai/esr-server                   v1.0.0               a19b83cc6d76        4 months ago        526 MB
nexus3.onap.org:10001/onap/cli                              v1.1.0               71c17999fa6c        4 months ago        849 MB
aaionap/haproxy                                             1.1.0                8d3554ec5751        4 months ago        139 MB
nexus3.onap.org:10001/onap/msb/msb_apigateway               1.0.0                8245d1b34d29        4 months ago        215 MB
nexus3.onap.org:10001/onap/msb/msb_discovery                1.0.0                1ab27b2abcfe        4 months ago        201 MB
nexus3.onap.org:10001/onap/portal-wms                       v1.3.0               fff0077a3c33        4 months ago        237 MB
nexus3.onap.org:10001/onap/portal-apps                      v1.3.0               8da6312ec821        4 months ago        677 MB
nexus3.onap.org:10001/onap/portal-db                        v1.3.0               7578762221a7        4 months ago        398 MB
rancher/metadata                                            v0.9.5               bd33f8c865b1        4 months ago        251 MB
rancher/kubectld                                            v0.8.5               dae21e9b9d0b        5 months ago        476 MB
gcr.io/google_containers/k8s-dns-sidecar-amd64              1.14.5               fed89e8b4248        5 months ago        41.8 MB
gcr.io/google_containers/k8s-dns-kube-dns-amd64             1.14.5               512cd7425a73        5 months ago        49.4 MB
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64        1.14.5               459944ce8cc4        5 months ago        41.4 MB
rancher/kubernetes-agent                                    v0.6.6               c87b478821f8        5 months ago        326 MB
consul                                                      0.9.3                8d44ae3c4e67        5 months ago        51.4 MB
aaionap/hbase                                               1.2.0                3568a85848ab        5 months ago        432 MB
gcr.io/google_containers/heapster-influxdb-amd64            v1.3.3               577260d221db        6 months ago        12.5 MB
gcr.io/google_containers/heapster-grafana-amd64             v4.4.3               8cb3de219af7        6 months ago        152 MB
rancher/dns                                                 v0.15.3              2779a18358f2        6 months ago        240 MB
rancher/kubernetes-auth                                     v0.0.8               ea0a0fa94c0c        6 months ago        293 MB
rancher/healthcheck                                         v0.3.3               14de771cc178        6 months ago        385 MB
rancher/etcd                                                v2.3.7-13            6c21cf015451        7 months ago        57.2 MB
rancher/etc-host-updater                                    v0.0.3               da6f714674e6        7 months ago        241 MB
docker.elastic.co/beats/filebeat                            5.5.0                b61327632415        8 months ago        271 MB
docker.elastic.co/kibana/kibana                             5.5.0                be0b56c8b9ee        8 months ago        630 MB
docker.elastic.co/elasticsearch/elasticsearch               5.5.0                2377bc62195f        8 months ago        510 MB
gcr.io/google_containers/heapster-amd64                     v1.4.0               749531a6d2cf        8 months ago        73.4 MB
docker.elastic.co/logstash/logstash                         5.4.3                0ad2b27b0ed6        8 months ago        605 MB
nexus3.onap.org:10001/library/cassandra                     2.1.17               77d9a025eb78        8 months ago        357 MB
attos/dmaap                                                 latest               b0ae220fcf1f        8 months ago        747 MB
oomk8s/ubuntu-init                                          1.0.0                14bb4db11858        9 months ago        207 MB
oomk8s/readiness-check                                      1.0.0                d3923ba1f99c        9 months ago        579 MB
oomk8s/mariadb-client-init                                  1.0.0                a5fa953bd4e0        9 months ago        251 MB
quay.io/kubernetes_incubator/nfs-provisioner                v1.0.8               b49c0db0c8b4        10 months ago       337 MB
rancher/net                                                 holder               665d9f6e8cc1        11 months ago       267 MB
wurstmeister/zookeeper                                      latest               351aa00d2fe9        15 months ago       478 MB
gcr.io/google-samples/xtrabackup                            1.0                  c415dbd7af07        15 months ago       265 MB
elasticsearch                                               2.4.1                8e3cf79edcc3        16 months ago       346 MB
gcr.io/google_containers/pause-amd64                        3.0                  99e59f495ffa        22 months ago       747 kB
nexus3.onap.org:10001/mariadb                               10.1.11              d1553bc7007f        2 years ago         346 MB



Here's the full debug logs from docker daemon, notice somewhere around Mar 06 02:32:07 there are messages indicating about OOM docker_daemon_Mar_05.zip


Yesterday we added additional 16GB as a swap and it survived OOM, however we see the trend in increase of memory usage, there was 68GB RAM utilized in total at some point today

Memory usage
@onap:~# free -h
              total        used        free      shared  buff/cache   available
Mem:            62G         55G        1.3G        493M        6.6G        6.4G
Swap:           15G         13G        2.2G


We noticed highest memory consumers where Portal and VID and they tend to require more and more memory. Here's the history:

Portal.png VID.png


Are such high memory usage of Portal and VID normal? Anyone experienced similar thing?


Beka

    CommentAdd your comment...

    6 answers

    1.  
      1
      0
      -1

      After running ONAP for about 1 month, we noticed docker daemon was consuming ~17GB memory:


        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
       1051 root      20   0 35.763g 0.017t  12284 S   5.0 27.9   4313:10 dockerd
      
      


      It seems there was a bug in docker versions prior 17.06: https://github.com/moby/moby/issues/32711


      I could verify it with the following output:

      root@onap:~# go tool pprof --inuse_space `which dockerd` dockerd_heap
      Entering interactive mode (type "help" for commands)
      (pprof) top5
      14.90GB of 15.27GB total (97.58%)
      Dropped 1737 nodes (cum <= 0.08GB)
      Showing top 5 nodes out of 22 (cum >= 0.27GB)
            flat  flat%   sum%        cum   cum%
         14.36GB 94.08% 94.08%    14.41GB 94.34%  io.copyBuffer
          0.37GB  2.42% 96.49%     0.37GB  2.42%  runtime.malg
          0.09GB  0.62% 97.11%     0.26GB  1.70%  github.com/docker/docker/daemon.(*Daemon).ContainerExecStart
          0.06GB  0.37% 97.49%     0.16GB  1.04%  github.com/docker/docker/container.AttachStreams
          0.01GB 0.093% 97.58%     0.27GB  1.79%  github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart
      (pprof)
        CommentAdd your comment...
      1.  
        1
        0
        -1
        1. Michael O'Brien

          good analysis - so essentially we are working like each JVM owns the entire host - and the GC kicking in when the heap is at 67% like normal does not occur.
          Usually GC kicks in at 2/3 and forms a sawtooth pattern.
          Yes we need to set limits per pod - very good idea

          https://kubernetes.io/docs/tasks/administer-cluster/memory-default-namespace/#create-a-limitrange-and-a-pod

        CommentAdd your comment...
      2.  
        1
        0
        -1

        Beka, 

           Hi, as Roger mentions - the teams themselves are responsible for what is running in their containers.

           However, a couple observations.

        You are running a higher kubectl client 1.9.2 (try to use 1.8.6) to match your server - not a big issue

        You are running helm 2.8.0 - this may have an issue with the latest change in OOM-722 - try to use the RI versions 2.6.1

        OOM-722 - OOM - Run all ONAP components in one namespace Closed

         I have observed the memory footprint go to 69G after a week on a 122G VM on AWS that was idle - so yes we have essentially crossed the 64G barrier - I will update the RI requirements.  On a 64G machine we now saturate to 63G within 48h.

          To help you out - you can run with a reduced number of ONAP components - I have been doing this at customer sites recently.

          Unless you are running advanced use cases like vVolte or vCPE you can delete the following

        LOG-296 - Provide user friendly deployment profiles for all component subtrees of ONAP Closed

        vnfsdk, aaf, vfc

        1. Beka Tsotsoria

          Thank you Michael for suggestions. Removing these 3 components freed up ~6GB. I restarted system and currently it is using 48G. Lets see how it goes

        CommentAdd your comment...
      3.  
        1
        0
        -1

        Thank you for the analysis.  S3P is an important aspect of the Beijing release of ONAP and part of these requirements are stability testing where memory leaks will be flushed out.  I'd suggest you raise a bug in JIRA against Portal and VID linked to this page.  There is nothing OOM can do to fix memory leaks other than healing the container(s) that ultimately fail which clearly doesn't address the root of the problem.

        Cheers, Roger

        1. Beka Tsotsoria

          I've opened issues here:

          SDC-1092 - SDC-CS memory leak? Closed

          VID-196 - Memory usage growing Closed

          PORTAL-211 - High memory usage? Closed

        CommentAdd your comment...
      4.  
        1
        0
        -1

        Hi,

        we experienced a similar issue. In our case, the ueb-listener-sdnc container enters in a faulty state just after deploying ONAP. We need to restart it once so that it connects properly, otherwise the 64GB of memory are rapidly consumed. With ueb-listener working properly, we can run ONAP for several days without issues (Amsterdam release).

        BR

        David

          CommentAdd your comment...