Introduction

This document explains how to run the ONAP demos on Azure using the Beijing release of ONAP.

The Beijing release had certain limitations due to which fixes/workarounds have been provided to execute the demos. The document contains the details of the fixes/workarounds and the steps to deploy them.


Current Limitations of Beijing Release and Workarounds


S.NoComponentIssue detailCurrent StatusFurther Actions
1SOCustom workflow to call Multivim adapterA downstream image of SO is placed on github which contains the custom workflow BPMN along with the Multivim adapter.A base version of code is pushed to gerrit that supports SO-Multicloud interaction. But, this won't support Multpart data(CSAR artifacts to pass to Plugin). This need to be upstreamed.
2MutliCloud pluginCurrent azure plugin on ONAP gerrit does not support vFW and vDNS use-casesUsing the downstream image from github and developed a custom chart in OOM (downstream) to deploy as part of multicloud component set.Need to upstream the azure-plugin code to support vFW and vDNS use-cases.



High level Solution Architecture

The High level solution architecture can be found here

Not all ONAP components have been shown in the high level solution. Only the new component/modules that are introduced in the solution are shown. Rest all remains the same.

Deploying ONAP on Azure using Beijing Release

ONAP needs to be deployed with the dockers containing the workarounds provided for the limitations in the Beijing release.

The OOM deployment values chart have also been modified to deploy the dockers with the fixes.

The detailed list of changes is given below:

 

S.NoProject Name

Docker Image

(Pull from dockerhub repo)

Remarks
1OOMNA

Contains the latest values.yaml files which point to downstream images of:

That include:

  1. SO
  2. Multicloud-azure-plugin
2SOelhaydox/mso:1.2.2_azure-1.1.0Contains the VFModule fix along with the newly developed BPMN and Multi VIM adapter
3multicloud-azureelhaydox/multicloud-azureAria plugin to interface with Azure and instantiate VNFs

Deploying ONAP on Azure

Creation of Kubernetes cluster on Azure

  • Login to azure

    az login --service-principal -u <client_id> -p <client_secret> --tenant <tenant_id/document_id>

     

  •  Create a resource group  

     az group create --name <resource_group_name> --location <location_name>
  • Get the deployment templates from ONAP gerrit

     git clone -o gerrit https://gerrit.onap.org/r/integration
     cd integration/deployment/Azure_ARM_Template
  • Change arm_cluster_deploy_parameters.json file data (if required)
  • Run the deployment template  

        az group deployment create --resource-group deploy_onap --template-file arm_cluster_deploy_beijing.json --parameters @arm_cluster_deploy_parameters.json

    The deployment process will take around 30 minutes to complete. You will have a cluster with 12 VMs being created on Azure(as per the parameters). The VM name with the post-index: "0" will run Rancher server. And the remaining VMs form a Kubernetes cluster.

Deploying ONAP

  • SSH to the VM using root user where rancher server is installed.(VM with postindex:"0" as mentioned before)

    Helm upgrade

    When you login to Rancher server VM for the first time, Run: "helm ls" to make sure the client and server are compatible. If it gives error: "Error: incompatible versions client[v2.9.1] server[v2.8.2]", then

    Execute: helm init --upgrade

  • Download the OOM repo from github (because of the downstream images)

    Get install script on Azure VM
    git clone -b beijing --single-branch https://github.com/onapdemo/oom.git
  • Execute the below commands in sequence to install ONAP

    Get install script on Azure VM
    cd oom/kubernetes
    make all # This will create and store the helm charts in local repo.
    helm install local/onap --name dev --namespace onap
    
    
    Due to network glitches on public cloud, the installation sometimes fail with error: "Error: release dev failed: client: etcd member http://etcd.kubernetes.rancher.internal:2379 has no leader". If one faces this during deployment, we need to re-install ONAP. For that: helm del --purge onap rm -rf /dockerdata-nfs/* #wait for few minutes
    helm install local/onap --name dev --namespace onap

Running ONAP use-cases

Refer to the below pages to run the ONAP use-cases

  1. vFW on Azure
  2. vDNS on Azure


Building the Source Code with fixes

If you want to take a look at the fixes and create the dockers for individual components, the source code for the fixes is available Source Code access

  • No labels

28 Comments

  1. Beijing Deployment on Azure - 20July.mp4

    This is the recording wherein Beijing deployment on Azure using OOM is show-cased. ARM template is used for this deployment.

  2. Is there a matching page with instructions for Casablanca ? I tried to install Casablanca and many components installed but ran into the Azure CNI networking issue with No MAC Address Found. wasnt able to work around it. Not sure if the default install I was using installed a version of k8 that was invalute - it seemed to be the 1.8 client with the 1.10+ server.

  3. Most of ONAP Casablance installed but it seems like timeouts are occurring inside the K8 cluster (I use a 9 node  K8 cluster)

    Using Casablanca install:

    helm deploy dev local/onap -f /root/integration-override.yaml --namespace onap

    ...

    Release "dev-sdnc" does not exist. Installing it now.
    Error: release dev-sdnc failed: Timeout: request did not complete within allowed duration
    Release "dev-so" does not exist. Installing it now.
    Error: release dev-so failed: Internal error occurred: failed to allocate a serviceIP: client: etcd cluster is unava ilable or misconfigured; error #0: client: etcd member http://etcd.kubernetes.rancher.internal:2379 has no leader


    Did you run into this ?

    1. Hi Brian,

      Yes we did run into this issue.I also mentioned the same in above note. etcd in kubernetes runs only in cluster mode and is very sensitive to the networking between the nodes. 

      In case of such failure, I usually delete the release and then re-deploy again.

      In the above case:

      helm del --purge dev-so

      helm insall local/so --name dev-so --namespace onap

  4. In looking at the performance issues I am wondering if the dockerdata-nfs setup with General Purpose Storage V1 is a problem. Have you tried Premium storage for the rancher /dockerdata-nfs ?

    Is there a simple change to the Azure template for that ?

    Also - wouild it help k8 if we increased the liveness probes for the applications as well from the current 10-30 seconds to more like 60 - 120 seconds ?

  5. I'm also wondering if the problem might be the storage class for /var/lib/docker since its HDD based. 

    Could we configure /var/lib/docker to be on SSD for both rancher and the k8 hosts ?

  6. Trying Standard DS4 v2 instead of Standard D4 v2 - wait % is much better for base kubernetes install which is a good sign

  7. Casablanca OOM install puts etcd on separate VMs (m1.large) can you update the Azure Templates in gerrit for this configuration as well


    Per Gary this is running etd on a separate Resiliency Plane (etcd=true  on those three VMs/hosts)

  8. A few suggestions.


    1. Running with Control Plane separation does reduce %wait time - put etcd and orchestration on separate nodes using the node lage etcd=true, orchestration=true, computre=true on separate nodes set (3 for etcd, 2 for orchestration and the rest for compute) - set plane separation to true
    2. If you have inbound network ACLs for your corporation add an acl to permit all traffic between the nodes on the internal network 10.0.0.0/24 source and 10.0.0.0/24destination.
    3. use nodeSelector to run robot and SO openstack pods on the same node so that they can share any routing to your external openstack data centers
    4. SSD host disks are definitely required
    5. Make sure the nfs mount is working from all hosts
  9. I noticed than when VM's rebooted the hostname presented to k8/rancher didnt match the original registration. Seeems like adding a /etc/host entry fixed the problem

    10.0.0.210  node10

    for example

  10. Sudhakar Reddy , I followed the instructions & the infra got created in Azure, But when I tried giving in "helm ls"/ "helm init --upgrade" its throwing out an error saying "Error: serializer for text/html doesn't exist". Any help is appreciated.

  11. Please let me know what is the password for root user for Azure VM

    1. Look like I have the same question, please advice where and how I can get pwd for root user for created VMs in Azure

      1. You must use the ssh key as onap_user to access the VMs in Azure. See deployment.txt for the path to the dynamically created key.

        --------NFS ACCESS--------
        ssh -i /home/xxxx/yyy/zzz/ONAPDlDK/id_rsa onap_user@10.10.10.10


        is an example.

        You can not log into the aks nodes as far as I know but you can kubectl exec  into the pods just like any other ONAP installation once you set KUBECONFIG as indicated in your deployment.txt file.

        1. thanks, but can't find deployment.txt at least in the files here https://gerrit.onap.org/r/integration 

          1. its dynamically created when you run cloud.sh  (integration/deployment/aks/cloud.sh) - my instructions are for the that deployment method in azure.

            1. thanks for your comments, finally I reset the password for VM via azure portal GUI and login via SSH with this new login and then run sudo su and it works, then I deployed onap from ./oom/scripts/deploy-aee.sh 

              1. I dont recognize deploy-aee.sh so I probably have not provided you with guidance that is relevant to the path you are on. This page is from Beijing release that last was tested i Casablanca as far as I know. We mainly use integration/deployment/aks/cloud.sh for deployment in Azure since it also deploys a Devstack for instantiation testing.

                1. Actually it was my first time, I mean ONAP deploy and so I was trying to follow this guide above and it contradicts a bit with what they do in video where they uses deploy-aee.sh for deploying. 

                  But now I have other issue, I can't access ONAP portal, do you know is there any basic hints how to find the root cause of the problem? 

                  1. which version of onap are you running ? portal access has changed alot since Beijing (and I assume portal is up and passing healtchecks)

                    1. I deployed the version which is available in github, the link is on this page above.

                      I run healthcheck (see below failed tests) and some test are failed, also some ports are not running (see below) but I didn't find any guide which tests are critical to pass and what to do if not, the same with ports info. 

                      I run also 

                      kubectl get svc -n onap | grep portal

                      got this 

                      portal-app LoadBalancer 10.43.179.217 52.138.136.244 

                      when I try to access 52.138.136.244 or via DNS name I can't get the response from server and I added these addresses into the hosts file. Also I didn't get why IP is 10.43.179.217 and when I see this host via Rancher GUI there is the same public IP 52.138.136.244 but different IP for loadbalancer container 10.42.206.22

                      btw is there any documentation for ONAP release with the list of all containers per each node with high level description what does this or that container


                      HEALTHCHECK

                      Basic AAF Health Check | FAIL |
                      Test timeout 1 minute exceeded.
                      Basic AAF SMS Health Check | FAIL |
                      500 != 200
                      Basic DCAE Health Check | FAIL |
                      500 != 200
                      Basic SDC Health Check | FAIL |
                      None != UP
                      OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp compo... | FAIL |
                      43 critical tests, 39 passed, 4 failed
                      43 tests total, 39 passed, 4 failed
                      OpenECOMP ETE.Robot.Testsuites | FAIL |
                      43 critical tests, 39 passed, 4 failed
                      43 tests total, 39 passed, 4 failed
                      OpenECOMP ETE.Robot | FAIL |
                      43 critical tests, 39 passed, 4 failed
                      43 tests total, 39 passed, 4 failed
                      OpenECOMP ETE | FAIL |
                      43 critical tests, 39 passed, 4 failed
                      43 tests total, 39 passed, 4 failed

                      --- PORTS ---

                      beijing-aaf-aaf-oauth-7fc564b797-pj9fw 0/1 CrashLoopBackOff 21 1h
                      beijing-aaf-aaf-service-6d748ffc7-z7kn6 0/1 CrashLoopBackOff 21 1h
                      beijing-appc-appc-dgbuilder-668495c8c7-wp6gz 0/1 CrashLoopBackOff 20 1h
                      beijing-sdnc-sdnc-ansible-server-f8bc55468-zspxt 0/1 Error 20 1h
                      beijing-sdnc-sdnc-dgbuilder-6ccbd8cb5c-fgj9n 0/1 CrashLoopBackOff 19 1h
                      dep-service-change-handler-6bd4875b5d-2bk87 0/1 CrashLoopBackOff 19 1h

                      1. I dont think the team that created that fork back in 2018 is supporting it. In any case Beijing code will have expired certificates and other problems.

                        I would use one of the supported deployment method on onap.readthedocs.io based on frankfurt. The aks method works quite well in azure.  We use azure daily as one of the environments in our CI chain. In your results AAF failed will be a show stopper for several components. WRT to IP addresses there are multiple internal net10 ip addresses assigned by kubernetes which you should not be us foingr portal access. Only the external IP addresses matter to the outside world. I didnt think Beijing portal used a load balancer setup (or at least I remember it being difficult to setup) so that is another reason to move to Frankfurt which is about to be released.

                        1. Ok, got it. 

                          For Frankfurt release the azure VM setup is the same? 

                        2. git clone -b <BRANCH> http://gerrit.onap.org/r/oom --recurse-submodules

                          what is the correct <branch> from Frankfurt release?
        2. or its in oom clone from git?

  12. look at the deployment file - there are instructuions on how to use the ssh key to login to the VM's. sudo should be active for the onap_user account.

  13. We need to upgrade the EL ALTO release on Azure instance, please let me know is anyone tried EL ALTO on azure ? if yes please let me know the procedure because we are having ARM templet for beijing release, we dont have ARM templet for future release.. please clarify

  14. root@k8s-host-d2se0:/home/ubuntu# kubectl get svc -n onap | grep portal
    portal-app LoadBalancer 10.43.51.198 <pending> 8989:30215/TCP,8006:30213/TCP,8010:30214/TCP,8443:30225/TCP 4d5h


    We upgraded the ELALTP version on Azure but load balancer IP is showing pending, please let me know how to resolve this issue...