Skip to end of metadata
Go to start of metadata

Duration 90 minutes TSC-2018-12-20.mp4

Agenda Items

Presented ByTimeNotes/LinksJIRA Task
Casablanca Maintenance Release5 mins

Any Infrastructure Improvement/Plan



Any LF showstopper?

#65866 - Nexus3 proxy verified 80-100x faster downloads since 20181217

#65794 - Nexus3 timing out - still getting 0.4MB/sec - not the usual 10+MB/sec

#65809 - Nexus3 slowdown 10X - docker pulls very slow in openlab

example normal speed on the proxy

example pull - 5 sec

sudo docker pull

this will take minutes

sudo docker pull experiencing a serious routing? issue not the older 3 hour slowdown from 4-6 months ago that was fixed. Normally prepull takes 30 min - it now takes 120+ hours for a full prepull since 20181217 - and 80x slowdown on image downloads.

Q) why jenkins has no issue with nexus3 - A) they are on the same domain - and don't go through a exchange


run a traceroute and notice

# this is from an AWS EC2 instance in us-east-2
ubuntu@ip-172-31-10-98:~$ traceroute
traceroute to (, 30 hops max, 60 byte packets
16 (  25.657 ms (  30.723 ms  30.673 ms 

discussions/helpdesk tickets,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28803606,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28800220,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,20,28794897,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28810052

Effect: anyone bringing up a clean ONAP system casablanca, master, 3.0.0-ONAP - all will take 35+ hours to come up depending on what is deployed - for example LOG uses dockerhub images it will come up fast - but AAF or any other pod that has images over 1G will each take a couple hours. Once casablanca is pulled you are good - they don't change - but for master - everytime you redeploy on a different day - all or some part of the images will need to be pulled.

Temp Workaround alternate proxy (have the LF's back when on vacation) up on 20181218 - taking 2 days to saturate with casablanca images 128G FS ETA late Friday - as of 20h of pulliing - see 21 images of

20181219:1700EDT status : 16h of pulls - 26 images

ubuntu@a-nexus3:~$ sudo docker images | wc -l
ubuntu@a-nexus3:~$ sudo docker images 
REPOSITORY                                        TAG                 IMAGE ID            CREATED             SIZE         1.3.4               723d184670e2        3 weeks ago         515 MB        1.0.1               ed643f4a192c        4 weeks ago         526 MB         1.3.3               4e5784b9e283        4 weeks ago         526 MB        1.4.3               10b6b253e1a9        4 weeks ago         160 MB            1.4.3               888018330bf5        4 weeks ago         2.88 GB                 1.3.2               b4012e79495e        4 weeks ago         625 MB       2.1.8               6eb295fed110        5 weeks ago         1.16 GB         2.1.8               74dcdce76094        5 weeks ago         1.16 GB        2.1.8               2a4eaa6275ff        5 weeks ago         1.16 GB         2.1.8               495a01176053        5 weeks ago         1.16 GB           2.1.8               8caa6dc681f0        5 weeks ago         1.16 GB            2.1.8               3d663698534d        5 weeks ago         1.16 GB            2.1.8               0ba25c4ec3fb        5 weeks ago         1.16 GB         2.1.8               090b326a7f11        5 weeks ago         1.14 GB        2.1.8               6506ac785cb5        5 weeks ago         1.14 GB          2.1.8               4b91e9b0b43f        5 weeks ago         323 MB   3.0.1               f8cf701eadc3        7 weeks ago         18.2 MB               3.0.1               02363fccc6c7        7 weeks ago         35.4 MB        1.2.1               00c9c28e8936        7 weeks ago         521 MB           1.2.1               4bd7ab7ae54a        7 weeks ago         512 MB     3.0.0               fc717d0b071c        2 months ago        1.17 GB             3.0.0               00a91d2dc09d        2 months ago        1.15 GB        3.0.0               d8d9137ef2d3        2 months ago        1.09 GB            1.0.0               8ec3df246a35        2 months ago        466 MB

slightly-not-useful-random graphs

for reference my private AWS proxy after 46h of pulls has 60 images of 180 - predicting 140h or 5.8 days

ubuntu@ip-172-31-10-98:~$ sudo docker images | wc -l
# at              3.0.1               1ebc02237c1c        2 months ago        122 MB

similar incoming traffic profile from alternate proxy for master - larger FS 200G

- access instructions, cert, installation on Cloud Native Deployment#NexusProxy

windriver lab also has a network issue (for example if i pull from (azure) into an aws EC2 instance - 45 sec for 1.1G - If I pull the same in an openlab VM - on the order of 10+ min) - therefore you need a local nexus3 proxy if you are inside the openstack lab - I have registered to a nexus3 proxy in my logging tenant - cert above

I take this tooooo seriously

TSC-79 - Getting issue details... STATUS
Task Force Update60 mins

E2E Process Automation

TODO: review the vFW automation in - thanks Yang Xu

TSC-53 - Getting issue details... STATUS

Task Force Update15 mins

Pair Wise Activities Update

Dublin Pair-Wise Testing

TSC-42 - Getting issue details... STATUS

TSC Activities and Deadlines

TSC Vice-Chair Election - Congratulations Lingli !,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28803393

TSC-3 - Getting issue details... STATUS

Incoming ONAP Events



Jan 8-11 - Dublin Release F2F Developer Design Forum (France):   

Feel free to request your VISA:

Submit your proposal:

Reminder - No TSC Call on December 27th, 2018

Zoom Chat Log

06:02:28 From Milind Jalwadi : #info Milind Jalwadi, TechMahindra Ltd.
06:02:34 From Jason Hunt : #info Jason Hunt, IBM
06:04:19 From Srini Addepalli (Intel) : #info Srinivasa Addepalli, Intel
06:04:40 From Marc Fiedler (DT) : #info Marc Fiedler, DT proxy of Andreas
06:05:00 From Jason Hunt : Sorry. :)
06:09:36 From Murat Turpcu ( Turk Telekom) : #info, Murat Turpcu Turk Telekom
06:11:38 From Dan Timoney : I think it’s a fair question why a routing change was made while LF was out for holidays.
06:12:24 From Catherine Lefevre : #65866 - Nexus3 proxy verified 80-100x faster downloads since 20181217 #65794 - Nexus3 timing out - still getting 0.4MB/sec - not the usual 10+MB/sec #65809 - Nexus3 slowdown 10X - docker pulls very slow in openlab
06:12:38 From Michael O'Brien(Amdocs,LOG,OSX) :
06:16:03 From Eric Debeau : #info presentation E2E Automation:
06:27:18 From Michael O'Brien(Amdocs,LOG,WIN) : The vFW is extremely difficult to demo to a client - with a lot of workarounds - I created that diagram back in Dec to help understand what was going on - it needs to be updated from Beijing, even with all I know about onap - I am still having issues running the simplest end to end use case
06:30:04 From Yang Xu : Integration team use Robot to automate vFW e2e in CI process for Casablanca
06:31:19 From Yang Xu : See, it had been used by release manager to report progress daily during Casablanca
06:31:20 From Michael O'Brien(Amdocs,LOG,WIN) : I'll go through the deployment scripts again in the integration repo - I did notice the new vFW checks in
06:32:56 From Yang Xu : The script is here
06:33:03 From Michael O'Brien(Amdocs,LOG,WIN) : what the goal of the vetted page was to in "exact" detail - describe every single step/fix/workaround/automation required to get from a set of 1+13 ubuntu vms - to 3 vFW deployed VMs - with some mitigation for any error in any interim step
06:36:29 From Michael O'Brien(Amdocs,LOG,WIN) : thanks yang never heard of that repo - will review - I would expect everything required to get ONAP and the vFW demo working would be in
06:39:48 From Yang Xu : there was reason (no auto VPN access to LF network is allowed) that Gary the code in GitHub. Everyone can access the github repo, and we will evaluate to see if we can move it within onap repo
06:40:24 From Michael O'Brien(Amdocs,LOG,WIN) : good reason - vpn issues on my end as well - thanks
06:46:01 From NingSo : #info Ning So, Reliance Jio
06:51:39 From Michael O'Brien(Amdocs,LOG,WIN) : these are the components involved in the vFW, vCPE actors are a bit different
06:51:39 From Michael O'Brien(Amdocs,LOG,WIN) :
06:53:32 From Michael O'Brien(Amdocs,LOG,WIN) : I definitely would like a completely automated one-click vFW robot script if it works - this was our goal since amsterdam
06:54:22 From Michael O'Brien(Amdocs,LOG,WIN) : we can always split the script into sections so manual workflow can be added
07:00:11 From Arash Hekmat (Amdocs) : To make ONAP use cases "generic" is to ask these two questions: On the Northbound, how can the use case be effectively managed (created, configured, monitored, deleted) by an external BSS system via the External API without having to know anything about the internal components of ONAP? On the Southbound, how can the use case work with Multiple Vendors network functions (and cloud infrastructures) without any modification to ONAP components?
07:01:15 From Michael O'Brien(Amdocs,LOG,OSX) : the wiki is a bit outdated - but to answer manual vFW (with a couple robot actions in the middle) - we have these
07:01:15 From Michael O'Brien(Amdocs,LOG,OSX) :
07:02:46 From Catherine Lefevre : Next steps: Deep dive with the PTLs
07:07:17 From Michael O'Brien(Amdocs,LOG,WIN) : Thanks Catherine - at the end we need to run ONAP in front of our team or the customer - we are the last word and must answer for all of onap's issues in the 4+ hour intense hands-on in front of the team - sometime traumetizing
07:12:14 From Steven Wright : use cases should be at the level where they only exercise external interfaces of the ONAP platform
07:12:48 From Catherine Lefevre : #Pair-Wise Proposal
07:14:02 From Michael O'Brien(Amdocs,LOG,OSX) : I like the flow of the french language - it sound nice
07:14:44 From Srini Addepalli (Intel) : Yes Steve. +1 on that. Robot scripts are hiding the actual complexity. We need to run vFW or vDNS with robot scripts and understand the number of steps one needs to do. I feel this excercise is important to understand issues.
07:14:57 From Srini Addepalli (Intel) : s/with/without
07:17:37 From Catherine Lefevre : +1 on Srini's comments and my understanding is that what Eric and the task force have tried to identify
07:21:17 From Catherine Lefevre : wiki page shared by Eric -
07:22:31 From Michael O'Brien(Amdocs,LOG,OSX) : CD done on OOM via helm charts for 1 or more components before merge by gerrit magic word - in progress with 3 LF personnel - we are at the paying for a target VM to host a minimal k8s/oom cluster right now - usually friday at 10 meets
07:22:32 From Michael O'Brien(Amdocs,LOG,OSX) :
07:23:00 From Michael O'Brien(Amdocs,LOG,OSX) : above can run automated CSIT
07:24:10 From Michael O'Brien(Amdocs,LOG,WIN) : we have and TSC-25 - two CD pocs
07:28:00 From Michael O'Brien(Amdocs,LOG,WIN) : we have helm-verify now - we will have helm-deploy magic word in TSC-25
07:30:13 From Michael O'Brien(Amdocs,LOG,OSX) : Logging and OOM team will work very closely with integration team (Yang and Gary) to work towards full CD
07:31:13 From Michael O'Brien(Amdocs,LOG,OSX) : one issue is that in side openlab there is a network bottleneck not seen outside the lab - the fix is to preload all your docker images on all hosts - just re-iterating for anyone bringing up a test cluster there
07:35:28 From Eric Debeau : Thanks you and Happy Christmas
07:35:47 From Marc Fiedler : Merry Christmas

TSC Decisions  

2018 TSC Decisions

  • No labels




        The RC for must be fixed – for now we are faster – from 2h per image to 17min – our goal is 45 sec.

        Some updates: the nexus3ap proxy is experiencing the same issues as any other proxy – it must deal with the latency of  On 2 VMs – different behavior

        On a VM that already pulled to the E’s from my own proxy – the pulls from nexus3ap were fast as most of the layers are shared and already downloaded – as soon as I hit an image that is not cached either locally or on nexus3ap – it takes 17 min to download a 1.1G image instead of 45 seconds – better than the 2 hours previously but not really fixed.

        On a VM that is empty – it takes the full 17 min per image to pull from nexus3ap

        It looks like nexus3ap is truncating the problem route enough to lower the download time from 2 hours to 17 min per aaf 1.1g image for example.

        Another issue is how we handle rebuilt images for branches like Casablanca – I need to this fully but hopefully we do not need to download the entire image from scratch/warm the proxy if we run the Jenkins merge jobs daily – ie the hash changes.

        There are indications this will not be an issues – because of shared layers – I repulled images that were already downloaded the day before and only get a 5 sec cycle

    1.0.5: Pulling from onap/org.onap.dcaegen2.collectors.datafile.datafile-app-server

    4fe2ade4980c: Already exists

    6fc58a8d4ae4: Already exists

    819f4a45746c: Pulling fs layer

    9c4800b836af: Pulling fs layer

    Clean server took 150 min to download these for example

    ubuntu@ip-172-31-17-47:~$ sudo docker images

    REPOSITORY                                     TAG                 IMAGE ID            CREATED             SIZE   2.1.8               6eb295fed110        5 weeks ago         1.16 GB     2.1.8               74dcdce76094        5 weeks ago         1.16 GB    2.1.8               2a4eaa6275ff        5 weeks ago         1.16 GB     2.1.8               495a01176053        5 weeks ago         1.16 GB       2.1.8               8caa6dc681f0        5 weeks ago         1.16 GB        2.1.8               3d663698534d        5 weeks ago         1.16 GB        2.1.8               0ba25c4ec3fb        5 weeks ago         1.16 GB     2.1.8               090b326a7f11        5 weeks ago         1.14 GB    2.1.8               6506ac785cb5        5 weeks ago         1.14 GB      2.1.8               4b91e9b0b43f        5 weeks ago         323 MB

  2. Team, slowdown is fixed as of 20181222:1900 EST

         I am getting fully speed on all downloads directly from nexus3 (no need for a or proxy now)

         Speed went from 0.2MB/sec to 48MB/sec – up by 250x – which is normal – for example a 800Mb dmaap-mr image downloads in 16sec now – on a clean VM.


         Issue closed or routing – rerouted.

         thank you Linux Foundation