After running ONAP for about 1 month, we noticed docker daemon was consuming ~17GB memory:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1051 root 20 0 35.763g 0.017t 12284 S 5.0 27.9 4313:10 dockerd
It seems there was a bug in docker versions prior 17.06: https://github.com/moby/moby/issues/32711
I could verify it with the following output:
root@onap:~# go tool pprof --inuse_space `which dockerd` dockerd_heap Entering interactive mode (type "help" for commands) (pprof) top5 14.90GB of 15.27GB total (97.58%) Dropped 1737 nodes (cum <= 0.08GB) Showing top 5 nodes out of 22 (cum >= 0.27GB) flat flat% sum% cum cum% 14.36GB 94.08% 94.08% 14.41GB 94.34% io.copyBuffer 0.37GB 2.42% 96.49% 0.37GB 2.42% runtime.malg 0.09GB 0.62% 97.11% 0.26GB 1.70% github.com/docker/docker/daemon.(*Daemon).ContainerExecStart 0.06GB 0.37% 97.49% 0.16GB 1.04% github.com/docker/docker/container.AttachStreams 0.01GB 0.093% 97.58% 0.27GB 1.79% github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart (pprof)
This is my observation: https://jira.onap.org/browse/VID-196?focusedCommentId=19602&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-19602
good analysis - so essentially we are working like each JVM owns the entire host - and the GC kicking in when the heap is at 67% like normal does not occur.
Usually GC kicks in at 2/3 and forms a sawtooth pattern.
Yes we need to set limits per pod - very good idea
Beka,
Hi, as Roger mentions - the teams themselves are responsible for what is running in their containers.
However, a couple observations.
You are running a higher kubectl client 1.9.2 (try to use 1.8.6) to match your server - not a big issue
You are running helm 2.8.0 - this may have an issue with the latest change in OOM-722 - try to use the RI versions 2.6.1
OOM-722 - OOM - Run all ONAP components in one namespace Closed
I have observed the memory footprint go to 69G after a week on a 122G VM on AWS that was idle - so yes we have essentially crossed the 64G barrier - I will update the RI requirements. On a 64G machine we now saturate to 63G within 48h.
To help you out - you can run with a reduced number of ONAP components - I have been doing this at customer sites recently.
Unless you are running advanced use cases like vVolte or vCPE you can delete the following
LOG-296 - Provide user friendly deployment profiles for all component subtrees of ONAP Closed
vnfsdk, aaf, vfc
Thank you Michael for suggestions. Removing these 3 components freed up ~6GB. I restarted system and currently it is using 48G. Lets see how it goes
Thank you for the analysis. S3P is an important aspect of the Beijing release of ONAP and part of these requirements are stability testing where memory leaks will be flushed out. I'd suggest you raise a bug in JIRA against Portal and VID linked to this page. There is nothing OOM can do to fix memory leaks other than healing the container(s) that ultimately fail which clearly doesn't address the root of the problem.
Cheers, Roger
I've opened issues here:
SDC-1092 - SDC-CS memory leak? Closed
VID-196 - Memory usage growing Closed
PORTAL-211 - High memory usage? Closed
Hi,
we experienced a similar issue. In our case, the ueb-listener-sdnc container enters in a faulty state just after deploying ONAP. We need to restart it once so that it connects properly, otherwise the 64GB of memory are rapidly consumed. With ueb-listener working properly, we can run ONAP for several days without issues (Amsterdam release).
BR
David
Has anyone ever tried to leave ONAP running for more than 24 hours? We've set it up from master branch using rancher approach. Apparently 64GB RAM is NOT enough for full installation as OOMs were happening over nights and kernel was killing processes randomly (including docker daemon), as a result system became highly unstable. We have to restart docker daemon once a day to mitigate the issue, but it's really annoying.
Here's the full debug logs from docker daemon, notice somewhere around Mar 06 02:32:07 there are messages indicating about OOM docker_daemon_Mar_05.zip
Yesterday we added additional 16GB as a swap and it survived OOM, however we see the trend in increase of memory usage, there was 68GB RAM utilized in total at some point today
We noticed highest memory consumers where Portal and VID and they tend to require more and more memory. Here's the history:
Portal.png VID.png
Are such high memory usage of Portal and VID normal? Anyone experienced similar thing?
Beka