Hi Michael O'Brien Does this include DCAE as well? I think this is the best way to install ONAP. Does this include any config files as well to talk to openstack cloud to instantiate VNFs?
I am planning to install ONAP but couldn't decide to use which way of the setup. Using Full ONAP setup on VMs or Kubernetes based setup with containers. Are both solutions will be developed in the future or development will continue with one of them ?
I see the recently added update about not being able to pull images because of missing credentials. I encountered this yesterday and was able to get a workaround done by creating the secret and embedding the imagePullSecrets to the *-deployment.yaml file.
In our current environment (namespace 1:1 → service 1:1 → pod 1:1 → docker container) it looks like the following single command will have a global scope (no need to modify individual yaml files - a slight alternative to what you have suggested which would work as well.
So no code changes which is good. Currently everything seems to be coming up - but my 70G VM is at 99% so we need more HD space.
Edit: actually even though it looked to work
2017-06-30T19:31 UTC 2017-06-30T19:31 UTC pulling image "nexus3.onap.org:10001/openecomp/sdc-elasticsearch:1.0-STAGING-latest" kubelet 172.17.4.99 spec.containers{sdc-es} 2 2017-06-30T19:31 UTC 2017-06-30T19:31 UTC
still getting errors without the namespace for each service like in your example - if we wait long enough
So a better fix Yves and I are testing is to put the line just after the namespace creation in createAll.bash
I'm surprised that it appears to work for you, as it doesn't for my environment. First, you should have to specify the imagePullSecrets for it to work... that can either be done in the yaml or by using the patch serviceaccount command. Second, the scope of the secret for imagePullSecrets is just that namespace:
Pods can only reference image pull secrets in their own namespace, so this process needs to be done one time per namespace.
In your environment, had you previously pulled the images before? I noticed in my environment that it would find a previously pulled image even if I didn't have the authentication credentials. To test that out, I had to add " imagePullPolicy: Always " to the *-deployment.yaml file under the container scope, so it would always try to pull it.
So I think a fix is necessary. I can submit a suggested change to the createAll.bash script that creates the secret and updates the service account in each namespace?
We previously saw a successful pull from nexus3 - but that turned out to be a leftover mod in my branch yaml for a specific pod.
Yes, I should know in about 10 min (in the middle of a redeploy) if I need to patch - makes sense because it would assume a magical 1:1 association - what if I created several secrets.
I'll adjust and retest.
btw, thanks for working with us getting Kubernetes/oom up!
My test of the updated create_namespace() method eliminated all of the "no credentials" errors. I have plenty of other errors (most seem to be related to the readiness check timing out), but I think this one is licked.
Is there a better way to track this than the comments here? Jira?
Actually our mso images loaded fine after internal retries - bringing up the whole system (except dcae) - so this is without a secret override on the yamls that target nexus3.
It includes your patch line from above
My vagrant vm ran out of HD space at 19G - resizing
wont work on the coreos image - moving up one level of virtualization (docker on virtualbox on vmware-rhel73 in win10) to (docker on virtualbox on win10)
vid still failing on FS
Failed to start container with docker id 47b63e352857 with error: Error response from daemon: {"message":"oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"rootfs_linux.go:54: mounting \\\\\\\"/dockerdata-nfs/onapdemo/vid/vid/lf_config/vid-my.cnf\\\\\\\" to rootfs \\\\\\\"/var/lib/docker/overlay2/0638a5d171ddacf7346133ee5e53104992243e897370bb054383f2e121e5d63f/merged\\\\\\\" at \\\\\\\"/var/lib/docker/overlay2/0638a5d171ddacf7346133ee5e53104992243e897370bb054383f2e121e5d63f/merged/etc/mysql/my.cnf\\\\\\\" caused \\\\\\\"not a directory\\\\\\\"\\\"\"\n: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type"}
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-aai.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal Error syncing pod
vid-mariadb-1108617343-zgnbd onap-vid Waiting: rpc error: code = 2 desc = failed to start container "c4966c8f8dbfdf460ca661afa94adc7f536fd4b33ed3af7a0857ecdeefed1225": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/vid/vid/lf_config/vid-my.cnf\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36/etc/mysql/my.cnf\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-vid.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal Error: failed to start container "vid-mariadb": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/vid/vid/lf_config/vid-my.cnf\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36/etc/mysql/my.cnf\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""} Error syncing pod
Hi, OOM-3 has been deprecated (it is in the closed state) - the secrets fix is implemented differently now - you don't need the workaround.
Also the search line limits is a bug in rancher that you can ignore - it is warning that more than 5 dns search terms were used - not an issue - see my other comments on this page
The only real issue is "Error syncing pod" this is an intermittent timing issue (most likely) that we are working on - a faster/more-cores system should see less of this.
If you only have 2 working pods - you might not have run the config-init pod - verify you have /dockerdata-nfs on you host FS.
but yes, again Many of them are stuck with the same error :- "Error Syncing POD"
and yes now the Server I am using is having 128GB Ram. (Though I have configured proxy in best known manner, but do you think this also can relates to proxy then I will dig more in that direction)
Update: containers are loading now - for example both pods for VID come up ok if we first run the config-init pod to bring up the config mounts. Also there is an issue with unresolved DNS entries that is fixed temporarily by adding to /etc/resolv.conf
Good news – 32 of 33 pods are up (sdnc-portal is going through a restart).
Ran 2 parallel Rancher systems on 48G Ubuntu 16.04.2 VM’s on two 64G servers
Stats: Without DCAE (which is up to 40% of ONAP) we run at 33G – so I would expect a full system to be around 50G which means we can run on a P70 Thinkpad laptop with 64G.
Had to add some dns-search domains for k8s in interfaces to appear in resolv.conf after running the config pod.
Issues:
after these 2 config changes the pods come up within 25 min except policy-drools which takes 45 min (on 1 machine but not the other) and sdnc-portal (which is having issues with some node downloads)
Michael O'Brien - (deprecated as of 20170508) - use obrienlabs i've got to the point where i can access the portal login page, but after inputting the credentials, it keeps redirecting to port 8989 and fails instead of the external mapped port (30215 in my case) any thoughts ?
i'm running on GCE with 40GB and only running sdc, message-router and portal for now.
I ran the OOM installation from scratch and managed to logged to Portal by changing back the port to 30215 after the redirection of the login.
Also when i logged in with cs0008 user and click on SDC, i have: "can’t establish a connection to the server at sdc.api.simpledemo.onap.org:8181" (should be changed to port 30206?)
Do you know which config has to be changed for this?
Are you accessing the ECOMP Portal via the 'onap-portal vnc-portal-1027553126-h6dhd' container?
This container was added to the standard ONAP deployment so one may VNC into the ONAP Deployment instance (namespace) and have networking resolved fully resolved within K8s.
Docker process are not running by own may be due to proxy internet being used. Trying running manually the install and setup by logging to each component.
Hi, there are a combination of files - some are in the container itself - see /var/opt
some are off the shared file system on the host - see /dockerdata-nfs
In the case of robot - you have spun up one pod - each pod has a single docker container, to see the other pods/containers - kubectl into each like you have into robot - just change the pod name. kubectl is an abstraction on top of docker - so you don't need to directly access docker containers.
Yes, I can see the mounted directories and found robot_install.sh in /var/opt/OpenECOMP_ETE/demo/boot
On K8s Dashboard and CLI, the POD is in running state but when I logged in (via kubectl) any of them, I am unable to see any docker process running via docker ps. (Even docker itself is not installed)
I think this Ideally is taken care by POD itself right or do we need to go inside each component and run the installation script of that specific.
Vaibhav, Hi, the architecture of kubernetes is such that it manages docker containers - we are not running docker on docker. Docker ps will only be possible on the host machine(s)/vm(s) that kubernetes is running on - you will see the wrapper docker containers running the kubernetes and rancher undercloud.
When you "kubectl exec -it" - into a pod you have entered a docker container the same as a "docker exec -it" at that point you are in a container process, try doing a "ps -ef | grep java" to see if a java process is running for example. Note that by the nature of docker most containers will have a minimal linux install - so some do not include the ps command for example.
If you check the instructions above you will see the first step is to install docker 1.12 only on the host - as you end up with 1 or more hosts running a set of docker containers after ./createAll.bash finishes
example - try the mso jboss container - it is one of the heavyweight containers
if you want to see the k8s wrapped containers - do a docker ps on the host
root@ip-172-31-93-122:~# docker ps | grep mso
9fed2b7ebd1d nexus3.onap.org:10001/openecomp/mso@sha256:ab3a447956577a0f339751fb63cc2659e58b9f5290852a90f09f7ed426835abe "/docker-files/script" 4 days ago Up 4 days k8s_mso_mso-371905462-w0mcj_onap-mso_11da22bf-8b3d-11e7-9e1a-0289899d0a5f_0
e4171a2b73d8 nexus3.onap.org:10001/mariadb@sha256:3821f92155bf4311a59b7ec6219b79cbf9a42c75805000a7c8fe5d9f3ad28276 "/docker-entrypoint.s" 4 days ago Up 4 days k8s_mariadb_mariadb-786536066-87g9d_onap-mso_11bc6958-8b3d-11e7-9e1a-0289899d0a5f_0
f099c5613bf1 gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 days ago Up 4 days k8s_POD_mariadb-786536066-87g9d_onap-mso_11bc6958-8b3d-11e7-9e1a-0289899d0a5f_0
Hi all, I am new to kubernetes installation of ONAP and have problems cloning onap repository. I have tried git clone -b release-1.0.0 http://gerrit.onap.org/r/oom but ended up with the following error fatal: unable to access 'http://gerrit.onap.org/r/oom/': The requested URL returned error: 403
I also tried to use ssh git clone -b release-1.0.0 ssh://cnleng@gerrit.onap.org:29418/oom but I cannot access settings on https://gerrit.onap.org (Already have an account on Linux foundation) to copy my ssh keys Any help will be appreciated. Thanks
Hi, I am trying to install ONAP components though oom, but getting the following errors:
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-appc.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal
I tried to edit /etc/resolve.conf according to Michael's comment above:
Geora, hi, that is a red herring unfortunately - there is a bug in rancher where they add more than 5 domains to the search tree - you can ignore these - the resolve.conf turns out to have no effect - it is removed except in the comment history
Has anyone managed to run ONAP on Kubernetes with more than one node? i'm unclear about how the /dockerdata-nfs volume mount works in the case of multiple nodes.
1) in my azure setup, i have one master node and 4 agent nodes (Standard D3 - 4CPU/ 14GB). after running the config-init pod (and completing) i do not see the /dockerdata-nfs directory being created on the master node. i am not sure how to check this directory on all the agent nodes. Is this directory expected to be created on all the agent nodes? if so, are they kept synchronized?
2) after the cluster is restarted/ there is a possibility that pods will run on different set of nodes, so if the /dockerdata-nfs is not kept in sync between the agent nodes, then the data will not be persisted.
ps: i did not use rancher. i created the k8s cluster using acs-engine.
The mounting of the shared dockerdata-nfs volume does not appear to happen automatically. You can install nfs-kernel-server and mount a shared drive manually. If you are running rancher on the master node (the one with the files in the /dockerdata-nfs directory, mount that directory to the agent nodes:
On Master:
# apt-get install nfs-kernel-server
Modify /etc/exports to share directory from master to agent nodes
I am trying to install ONAP on Kubernetes and I got the following error while trying to run ./createAll.bash -n onap -a robot|appc|aai command:
Command 'mppc' from package 'makepp' (universe) Command 'ppc' from package 'pearpc' (universe) appc: command not found No command 'aai' found, did you mean: Command 'axi' from package 'afnix' (universe) Command 'ali' from package 'nmh' (universe) Command 'ali' from package 'mailutils-mh' (universe) Command 'aa' from package 'astronomical-almanac' (universe) Command 'fai' from package 'fai-client' (universe) Command 'cai' from package 'emboss' (universe) aai: command not found
Does anyone have an idea? (kubernetes /helm is already up and running)
Hi, Michael O'Brien .I am trying to install ONAP through the way above and encountered a problem.
The pod of hbase in kubernetes returns to “Readiness probe failed: dial tcp 10.42.76.162:8020: getsockopt: connection refused”. It seems like the service of hbase is not started as expected.The container named hbase in Rancher logs:
Starting namenodes on [hbase] hbase: chown: missing operand after '/opt/hadoop-2.7.2/logs' hbase: Try 'chown --help' for more information. hbase: starting namenode, logging to /opt/hadoop-2.7.2/logs/hadoop--namenode-hbase.out localhost: starting datanode, logging to /opt/hadoop-2.7.2/logs/hadoop--datanode-hbase.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.2/logs/hadoop--secondarynamenode-hbase.out starting zookeeper, logging to /opt/hbase-1.2.3/bin/../logs/hbase--zookeeper-hbase.out starting master, logging to /opt/hbase-1.2.3/bin/../logs/hbase--master-hbase.out starting regionserver, logging to /opt/hbase-1.2.3/bin/../logs/hbase--1-regionserver-hbase.out
Nexus3 usually has intermittent connection issues - you may have to wait up until 30 min. Yesterday I was able to bring it up on 3 systems with the 20170906 tag (All outside the firewall)
I assume MSO (earlier in the startup) worked - so you don't have a proxy issue
Sent: Friday, September 8, 2017 14:36 To:onap-discuss@lists.onap.org Subject: [onap-discuss] [oom] config pod changes
OOM users,
I’ve just pushed a change that requires a re-build of the /dockerdata-nfs/onap/ mount on your K8s host.
Basically, what I’ve tried to do is port over the heat stack version of ONAPs configuration mechanism. The heat way of running ONAP writes files to /opt/config/ based on the stack’s environment file that has the details related to each users environment. These values are then swapped in to the various VMs containers using scripts.
Now that we are using helm for OOM, I was able to do something similar in order to start trying to run the vFW/vLB demo use cases.
I have also been made aware that this change requires K8s 1.6 as I am making use of the “envFrom” https://kubernetes.io/docs/api-reference/v1.6/#container-v1-core. We stated earlier that we are setting minimum requirements of K8s 1.7 and rancher 1.6 for OOM so hopefully this isn’t a big issue.
It boils down to this:
/oom/kubernetes/config/onap-parameters.yaml is kind of like file “onap_openstackRC.env” and you will need to define some required values otherwise the config pod deployment will fail.
1? I am trying to install ONAP on Kubernetes and encountered a problem.
I create msb pods first by command "./createAll.bash -n onap -a msb", then create aai pods by command "/createAll.bash -n onap -a aai". The problem is that all serviceName and url of aai do not register to msb as expected. I find the code of aai project has those lines "
Goal: I want to deploy and manage vFirewall router using ONAP.
I installed ONAP on Kubernetes using oom(release-1.0.0). All Services are running except DCAE as it is not yet completely implemented in Kubernetes. Also, I have an OpenStack cluster configured separately.
How can I integrate DCAE to the above Kubernetes cluster?
{"log":"Waiting for resources to be up\n","stream":"stdout","time":"2017-09-21T18:23:53.274547381Z"} {"log":"aai-resources.api.simpledemo.openecomp.org: forward host lookup failed: Unknown host\n","stream":"stderr","time":"2017-09-21T18:23:58.279615776Z"} {"log":"Waiting for resources to be up\n","stream":"stdout","time":"2017-09-21T18:23:58.279690784Z"}
I am using OOM 1.1.0 version. I have pre pulled all the images using the prepull_docker.sh. But after creating the pods using createAll.sh script all the pods are coming up except DCAE. Is DCAE supported in 1.1.0 release? If not then when is it expected to be functional? Will I be able to run the vFW demo close loop without DCAE?
More details below:
The DCAE specific images shown are:
root@hcl:~# docker images | grep dcae
nexus3.onap.org:10001/openecomp/dcae-controller 1.1-STAGING-latest ff839a80b8f1 12 weeks ago 694.6 MB
nexus3.onap.org:10001/openecomp/dcae-collector-common-event 1.1-STAGING-latest e3daaf41111b 12 weeks ago 537.3 MB
nexus3.onap.org:10001/openecomp/dcae-dmaapbc 1.1-STAGING-latest 1fcf5b48d63b 7 months ago 328.1 MB
The DCAE health check is failing
Starting Xvfb on display :88 with res 1280x1024x24
ConnectionError: HTTPConnectionPool(host='dcae-controller.onap-dcae', port=8080): Max retries exceeded with url: /healthcheck (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f26aee31550>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Vidhu, hi, DCAE was in in 1.0 of OOM on 28 Sept 2017 - however for R1/Amsterdam the new project DCAEGEN2 was only done in HEAT. There is an effort to move the containers to Kubernetes, an effort to use the developer setup with 1 instead of 7 cdap hadoop nodes and an effort to complete the bridge between the hybrid HEAT/Kubernetes setup - specific only to DCAEGEN2. One or more of these should be in shortly as we work the DCAE team. You are welcome to help both teams with this large effort.
While oneclick/createAll.bash includes DCAEGEN2 pod creation, the automation script cd.sh hits the ERROR condition when creating DCAEGEN2 because createAll.bash expect /home/ubuntu/.ssh/onap_rsa to exist. Here's some output from one of today's Jenkin's run console log (http://jenkins.onap.info/job/oom-cd/1853/consoleFull):
19:21:03********** Creating deployments for dcaegen2 **********
19:21:0319:21:03Creating namespace **********
19:21:03namespace "onap-dcaegen2" created
19:21:0319:21:03Creating service account **********
19:21:03clusterrolebinding "onap-dcaegen2-admin-binding" created
19:21:0319:21:03Creating registry secret **********
19:21:03secret "onap-docker-registry-key" created
19:21:0319:21:03Creating deployments and services **********
19:21:03ERROR: /home/ubuntu/.ssh/onap_rsa does not exist or is empty. Cannot launch dcae gen2.
19:21:03ERROR: dcaegen2 failed to configure: Pre-requisites not met. Skipping deploying it and continue
19:21:0419:21:04
Yes, DCAEGEN2 works via OOM- I verified it last friday. However only in the amsterdam release with the proper onap-parameters.yaml (will be ported to Beijing/master shortly).
Hi, I sense that there is a bit lack of information here. which, I would be happy to acquire.
There is a file that describes the onap environment, "onap-parameters.yaml". I think that it will good practice to provide data on how to fill it (or acquire the values that should be resides in it).
Mor, You are welcome to help us finish the documentation for OOM-277
The config was changed on friday - those us here are playing catch up on some of the infrastructure changes as we are testing the deploys every couple days - you are welcome to add to the documentation here - usually the first to encounter an issue/workaround documents it - so the rest of us can benefit.
Most of the content on this tutorial is added by developers like yourself that would like to get OOM deployed and fully functional - at ONAP we self document anything that is missing
There was a section added on friday for those switching from the old-style config to the new - you run a helm purge
The configuration parameters will be specific to your rackspace/openstack config - usually you match your rc export. There is a sample posted from before when it was in the json file in mso - see the screen cap.
The major issue is than so far no one using pure public ONAP has actually deployed a vFirewall yet (mostly due to stability issues with ONAP that are being fixed)
First verify that your portal containers are running in K8s (including the vnc-portal). Make notice of the 2/2 and 1/1 Ready states. If a 0 is on the left of those numbers then the container is not fully running.
Hi, is there a page available where we could find any sort of updated list/diagram of the dependencies between the different onap components? Also is there a breakdown of the memory requirements for the various oom components?
No official documentation on the dependencies at this point. But a very good idea to add. I will look into doing this.
For now you can see the dependencies in each of the deployment descriptors like in the AAI traversal example (see below) that depends on aai-resource and hbase containers before it starts up. In OOM we make use of Kubernetes init-containers and readiness probes to implement the dependencies. This prevents the main container in the deployment descriptor from starting until its dependencies are "ready".
oom/kubernetes/aai/templates] vi aai-traversal-deployment.yaml
Samuel, To add to the dependency discussion by Mike - Ideally I would like to continue the deployment diagram below with the dependencies listed in the yamls he refers to
The diagram can be edited by anyone - I will take time this week and update it.
Hi, VFC is still a work in progress - the VFC team is working through issues with their containers. You don't currently need VFC for ONAP to function - you can comment it out of the oneclick/setenv.bash helm line (ideally we would leave out services that are still WIP).
I am trying to bring up ONAP using Kubernets. Can you tell please if I should pull only OOM release-1.0.0 or a pull from master branch should also be fine, to get the ONAP up & running and also to run demo on it.
Rajesh, Hi, the latest master is 1.1/R1 - the wiki is now targeting 1.1 - I'll remove the 1.0 link. Be aware that ONAP in general is undergoing stabilization at this point.
I am getting the same error as a few people above when it comes to accessing SDC where it says I am not authorized to view this page, and it also gives me a 500 error. My initial impression is that this might be because I cannot reach the IP corresponding to the sdc.api.simpledemo.openecomp.org in the /etc/hosts file from my vnc container.
Could anybody confirm if this may cause an issue? And if so, which container/host/service IP should be paired with the sdc url?
Actually, I believe the resolution is correct, as it maps to the sdc-fe service, and if I change the IP to any other service the sdc web page times out. Also, if I curl<sdc-url>:8080 I do get information back. I am still not sure what might be causing this issue. Currently I am trying to look through the sdc logs for hints, but no luck as of yet
actually those are for sdc-be, I see a chef error on sdc-es - but the pod starts up ok (need to verify the endpoints though) - also this pod is not slated for the elk filebeat sister container - it should
[2017-10-14T11:06:17-05:00] ERROR: cookbook_file[/usr/share/elasticsearch/config/kibana_dashboard_virtualization.json] (sdc-elasticsearch::ES_6_create_kibana_dashboard_virtualization line 1) had an error: Chef::Exceptions::FileNotFound: Cookbook 'sdc-elasticsearch' (0.0.0) does not contain a file at any of these locations:
files/debian-8.6/kibana_dashboard_virtualization.json
files/debian/kibana_dashboard_virtualization.json
files/default/kibana_dashboard_virtualization.json
files/kibana_dashboard_virtualization.json
This cookbook _does_ contain: ['files/default/dashboard_BI-Dashboard.json','files/default/dashboard_Monitoring-Dashboared.json','files/default/visualization_JVM-used-CPU.json','files/default/visualization_JVM-used-Threads-Num.json','files/default/visualization_number-of-user-accesses.json','files/default/logging.yml','files/default/visualization_JVM-used-Memory.json','files/default/visualization_host-used-Threads-Num.json','files/default/visualization_Show-all-certified-services-ampersand-resources-(per-day).json','files/default/visualization_Show-all-created-Resources-slash-Services-slash-Products.json','files/default/visualization_host-used-CPU.json','files/default/visualization_Show-all-distributed-services.json']
[2017-10-14T11:06:17-05:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
getting a chef exit on missing elk components in sdd-es - even though this one is not slated for the sister filebeat container - likely a reused script across all pods in sdc - will take a look
oot@obriensystemsu0:~/onap/oom/kubernetes/oneclick# kubectl logs -f -n onap-aai aai-traversal-3982333463-vb89g aai-traversalCloning into 'aai-config'... [2017-10-14T10:50:36-05:00] INFO: Started chef-zero at chefzero://localhost:1 with repository at /var/chef/aai-config One version per cookbook environments at /var/chef/aai-data/environments
[2017-10-14T10:50:36-05:00] INFO: Forking chef instance to converge... Starting Chef Client, version 13.4.24 [2017-10-14T10:50:36-05:00] INFO: *** Chef 13.4.24 *** [2017-10-14T10:50:36-05:00] INFO: Platform: x86_64-linux [2017-10-14T10:50:36-05:00] INFO: Chef-client pid: 43 [
I am trying to setup ONAP using Kubernetes. I am using rancher to setup Kubernetes cluster. i am having 5 machine with 16GB memory each. Configured kubernentes successfully. when i am running createAll.bash to setup ONAP application, some of the components are successfully configured and running but some of the components are failing and with "ImagePullOfBack" error.
when i am trying to pull images independently i am able to download images from nexus successfully but not when running through createAll script. When i went through the script seem everything fine and not able to understand what is wrong. could you please help me understand the issue.
Hi, try running the docker pre pull script on all of your machines first. Also you may need to duplicate /dockerdata-nfs across all machines - manually or via a shared drive.
Yes, we have been getting this since last friday - I have been too busy to raise an issue like normal - this is not as simple as onap-parameters.xml it looks like a robot change related to the SO rename - will post a JIRA/workaround shortly. Anyway SO is not fully up on OOM/Heat anyway currently.
I have brought up ONAP using OOM master branch which I have pulled yesterday.But on running health check I am facing similar issues as discussed above where MSO fails with 503 error, and I also see portal failing with 404 error.
Can you please let us know if there is any workaround for this issue or is there any build where the necessary components for running vFW/vDNS demos like portal,SDC,AAI,SO,VID,SDNC,Policy and DCAE are healthy.
how do I set/correct the missing values in the health check? How do I know if everything should be working with a current deployment?
root@onap-oom-all-in-one:/dockerdata-nfs/onap/robot# ./ete-docker.sh health
Starting Xvfb on display :88 with res 1280x1024x24
Executing robot tests at log level TRACE
==============================================================================
OpenECOMP ETE
==============================================================================
OpenECOMP ETE.Robot
==============================================================================
OpenECOMP ETE.Robot.Testsuites
==============================================================================
[ ERROR ] Error in file '/var/opt/OpenECOMP_ETE/robot/resources/clamp_interface.robot': Setting variable '${CLAMP_ENDPOINT}' failed: Variable '${GLOBAL_CLAMP_SERVER_PROTOCOL}' not found. Did you mean:
${GLOBAL_DCAE_SERVER_PROTOCOL}
${GLOBAL_APPC_SERVER_PROTOCOL}
${GLOBAL_MR_SERVER_PROTOCOL}
${GLOBAL_MSO_SERVER_PROTOCOL}
${GLOBAL_AAI_SERVER_PROTOCOL}
${GLOBAL_ASDC_SERVER_PROTOCOL}
[ ERROR ] Error in file '/var/opt/OpenECOMP_ETE/robot/resources/msb_interface.robot': Setting variable '${MSB_ENDPOINT}' failed: Variable '${GLOBAL_MSB_SERVER_PROTOCOL}' not found. Did you mean:
${GLOBAL_MSO_SERVER_PROTOCOL}
${GLOBAL_MR_SERVER_PROTOCOL}
${GLOBAL_ASDC_SERVER_PROTOCOL}
${GLOBAL_SDNGC_SERVER_PROTOCOL}
${GLOBAL_VID_SERVER_PROTOCOL}
${GLOBAL_AAI_SERVER_PROTOCOL}
${GLOBAL_DCAE_SERVER_PROTOCOL}
${GLOBAL_APPC_SERVER_PROTOCOL}
OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp components are...
==============================================================================
Basic DCAE Health Check [ WARN ] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e8f955fd0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /gui
[ WARN ] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e8fe14350>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /gui
[ WARN ] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e8fda87d0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /gui
| FAIL |
ConnectionError: HTTPConnectionPool(host='dcae-controller.onap-dcae', port=9998): Max retries exceeded with url: /gui (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e8de52250>: Failed to establish a new connection: [Errno -2] Name or service not known',))
------------------------------------------------------------------------------
Basic SDNGC Health Check | PASS |
------------------------------------------------------------------------------
Basic A&AI Health Check | PASS |
------------------------------------------------------------------------------
Basic Policy Health Check | PASS |
------------------------------------------------------------------------------
Basic MSO Health Check | FAIL |
503 != 200
------------------------------------------------------------------------------
Basic ASDC Health Check | PASS |
------------------------------------------------------------------------------
Basic APPC Health Check | PASS |
------------------------------------------------------------------------------
Basic Portal Health Check | PASS |
------------------------------------------------------------------------------
Basic Message Router Health Check | PASS |
------------------------------------------------------------------------------
Basic VID Health Check | PASS |
------------------------------------------------------------------------------
Basic Microservice Bus Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
Basic CLAMP Health Check | FAIL |
Variable '${CLAMP_ENDPOINT}' not found.
------------------------------------------------------------------------------
catalog API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
emsdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
gvnfmdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
huaweivnfmdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
jujuvnfmdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
multicloud API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
multicloud-ocata API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
multicloud-titanium_cloud API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
multicloud-vio API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
nokiavnfmdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
nslcm API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
resmgr API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
usecaseui-gui API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
vnflcm API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
vnfmgr API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
vnfres API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
workflow API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
ztesdncdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
ztevmanagerdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp compo... | FAIL |
31 critical tests, 8 passed, 23 failed
31 tests total, 8 passed, 23 failed
==============================================================================
OpenECOMP ETE.Robot.Testsuites | FAIL |
31 critical tests, 8 passed, 23 failed
31 tests total, 8 passed, 23 failed
==============================================================================
OpenECOMP ETE.Robot | FAIL |
31 critical tests, 8 passed, 23 failed
31 tests total, 8 passed, 23 failed
==============================================================================
OpenECOMP ETE | FAIL |
31 critical tests, 8 passed, 23 failed
31 tests total, 8 passed, 23 failed
==============================================================================
Output: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_14804/output.xml
Log: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_14804/log.html
Report: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_14804/report.html
A persistent NFS mount is recommended in the official docs - this is a collaborative wiki - as in join the party of overly enthusiastic developers - in my case I run on AWS EBS so not an issue - you are welcome to help document the ecosystem.
The sky at OOM is a very nice shade of blue!
Sorry I am super excited about the upcoming developer conference on 11 Dec.
In my setup, I am able to start the ONAP components only if all the images already are downloaded using prepull_docker.sh. So far, I have been able to start all aai components using "createAll.bash -n onap -a aai" after the images have been downloaded using prepull_docker.sh.
Here are the challenges I am facing
"nexus3.onap.org:10001/onap/clamp" is downloaded in the local docker repository but "kubectl get pods --all-namespaces | grep clamp" fails with the following error
Thanks Beili. Below is the error I get for clamp. Looks like clamp is expecting some configuration, specifically password. Any clues on the specific configuration which needs to be updated?
*************************** APPLICATION FAILED TO START ***************************
Description:
Binding to target org.onap.clamp.clds.config.EncodedPasswordBasicDataSource@53ec2968 failed:
Property: spring.datasource.camunda.password Value: strong_pitchou Reason: Property 'password' threw exception; nested exception is java.lang.NumberFormatException: For input string: "st"
Use the recommended subset (essentially ONAP 1.0 components from the original seed code in Feb 2017 - these work with the vFirewall use case - until we stabilize the R1 release.
Clamp, aaf, and vfc are currently still being developed - there are usually 2 to pod failures in these components - I will post the JIRAs. - these are known issues and being worked on in the OOM JIRA board.
You don't need these 3 components to run the vFirewall - for now I would exclude them in HELM_APPS in setenv.bash - later when they are stable you can add them back.
Yes, been thinking about this for some time - and I have seen issues where we don't pick up problems we should have with for example the openecomp to onap refactor earlier this week - As you know from the TSC meeting yesterday - the manifest is still in flux in the move to the dockerhub versions
I am not sure yet - but I would expect that master continues to pull from nexus/nexus3, and the R1 branch pulls from dockerhub - but need to verify - put a watch on the JIRA - I usually update them with critical info/links/status
I have successfully start onap on kubernetes with below apps in setenv.sh. All pods show 1/1 running, but when I login to portal I only SDC. Why are the other modules not appearing in portal?
Thanks Rahul Sharma. I have encountered another issue, SDC keeps giving me 500 error saying you are authorized to view this page, when I login as cs0008. I see in comments above that this is a known issue. Is there a workaround for this or can I pull older/stable code to avoid this?
This is a great accomplishment for us to start playing with- thanks a lot Amar and Prakash for your effort putting things together. One thing I mentioned earlier in the call, we probably need to review and upgrade not using Docker 1.12 (2 years old) where Docker now moving away to 1.13 last year now Docker CE (public) and Docker EE (Enterprise) where number starting with Docker 1.17.x (2017=1.17, 2018, 1.18). Also Rancher is not mandatory just to build Kubernetes only as I met several customers using in production where we can build Kubernetes 1.6, 1.7 or 1.8 quite easy now using Kubeadm in few minutes (skipping Rancher). I meant Rancher is good for other usecases where customers need multi orchestrator environment (K8s, Mesos, Swarm). I don't see real value for Rancher to be here in our ONAP document where it might be confusing people that Rancher is mandatory just for bringing up K8s. Another thing, I was attending last Docker conference, Kubernetes will soon support Containerd in which CLI command to be running will be "crictl" not "kubectl" anymore, allowing Kubernetes to be working directly with Containerd, thus improving performance for Kubernetes where ONAP will be fully taking benefif of (GA will be end of 2017). We probably need to closely follow what Kubernetes community is heading to so accordingly update our documentation. Kind of difficult to update our documentation every month but keep up with Kubernetes is a good way to catch in my opinion...
I agree - we will move from docker 1.12 when we move from Rancher 1.6.10 to Rancher 2.0 - where we can use 1.17.x - but it is a Rancher + Docker + Kubernetes config issue.
Rancher is not required - we tried minikube, there are also commercial CaaS frameworks - however Rancher is the simplest and fastest approach at the moment.
You are welcome to join the OOM call at 10AM EDT on Wed - we usually go through the JIRA board - and the Kubeadm work sounds like a good Epic to work on. We are very interested in various environments and alternatives for running our pods - please join.
There is also a daily OOM blitz on stabilizing the branch and deploying the vFirewall use case that you are welcome to attend
1200EDT noon until either the 4th Dec KubeCon or the 11 dec ONAP developer conference.
Hi all. I have a question. In the page of installation using HEAT, v CPU needs 148, but this page discribes 64 v CPU needed. why these has differences so much. are there differences of items that can be installed?
Good question, as you know CPU can be over-provisioned - threads will just queue more, unlike RAM and HD which cannot be shared. 64 vCPUs is a recommended # of vCPUs based on bringing up the system on 64 and 128 core systems on AWS - we top out at 44 cores during startup (without DCAE - so this may be multiplied by 3/2 in that case as DCAE has 1/3 the containers in ONAP). Therefore for non-staging/non-production systems you will not gain anything having more that 44 vCores until we start hammering the system with real world VNF traffic. The HEAT provisioning is a result of the fact that the docker allocation model is across multiple silo VMs and not flat like in Kubernetes currently. Therefore some servers may only use 1/8 where others may peak at 7/8. It all depends on how you use onap.
You can get away during development with 8 vCores - ONAP will startup in 11m instead of 7 on 32 vCores.
Since DCAE is not currently in Kubernetes in R1 - then you need to account for it only in openstack.
Depending on the VNF use case you don't need the whole system yet, for example the vFW only needs 1.0.0. era components, where vVolte and vCPE will need new R1 components - see the HELM_APPS recommendation in this wiki.
Similar ONAP HEAT deployment (without DCAE or the OPEN-O VM - triple the size in that case) - this will run the vFirewall but not to closed-loop.
thank you for your answering my question. It's make me easier to understand. I'll use HEAT installation and allocate tempolarily 148 v CPU because of need to use DCAE. I'll also see the page you referenced.
I was getting the following error when running "./createConfig.sh -n onap"
Error: release onap-config failed: namespaces "onap" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "onap"
I think , the difference in both version is about the init container, due to which in v1.8.3, it waits for the dependent container to come up due to which some time the dependent container gets timed out for me like vnc-portal.
such as drools checking for brmsgw to become up:-
2017-11-27 08:16:46,757 - INFO - brmsgw is not ready. 2017-11-27 08:16:51,759 - INFO - Checking if brmsgw is ready 2017-11-27 08:16:51,826 - INFO - brmsgw is not ready. 2017-11-27 08:16:56,831 - INFO - Checking if brmsgw is ready 2017-11-27 08:16:56,877 - INFO - brmsgw is ready!
2) Using docker ps –a command to list the containers.
root@k8s-2:/# docker ps -a | grep sdc-be
347b4da64d9c nexus3.onap.org:10001/openecomp/sdc-backend@sha256:d4007e41988fd0bd451b8400144b27c60b4ba0a2e54fca1a02356d8b5ec3ac0d "/root/startup.sh" 53 minutes ago Up 53 minutes k8s_sdc-be_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_0
2b4cf42b163a oomk8s/readiness-check@sha256:ab8a4a13e39535d67f110a618312bb2971b9a291c99392ef91415743b6a25ecb "/root/ready.py --con" 57 minutes ago Exited (0) 53 minutes ago k8s_sdc-dmaap-readiness_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_3
a066ef35890b oomk8s/readiness-check@sha256:ab8a4a13e39535d67f110a618312bb2971b9a291c99392ef91415743b6a25ecb "/root/ready.py --con" About an hour ago Exited (0) About an hour ago k8s_sdc-be-readiness_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_0
1fdc79e399fd gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df
3) Use this command to see the docker logs
Docker logs 347b4da64d9c | grep err/exceptions
4) Observe the error logs and exceptions.
Currently we are getting below mentioned exceptions:
Recipe Compile Error in /root/chef-solo/cache/cookbooks/sdc-catalog-be/recipes/BE_2_setup_configuration
2017-12-06T11:53:48+00:00] ERROR: bash[upgrade-normatives] (sdc-normatives::upgrade_Normatives line 7) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.openecomp.sdcrests.health.rest.services.HealthCheckImpl]: Constructor threw exception; nested exception is java.lang.ExceptionInInitializerError
We are following below mentioned link for configuration.
Installing on Azure - other than the network security groups via portal.azure.com screenshots seemed to go okay up to running cd.sh.
You need to number the steps since sometimes its not obvious when you are switching to a new task vs describing some future or optional part. Had to be careful to not blindly copy/paste since you have multiple versions in the steps some with notes like "# below 20171119- still verifying - donot use" which was confusing. The video has the steps which is good but its tedious to start/stop the video and then look at the next step in the wiki. I will update when it completes.
Do we need to add port 10250 to the security groups ? I got error messages on cd.sh (but admittedly I didnt watch that part of the video)
Azure VMs seem to only have a 30GB OS disk. I can add a data disk but I think I should run the install from someplace other than root. Is that simple to change in cd.sh ?
missing from OOM (looks like we don't need these at least until after vf-module creation - or we are just missing jms messages)
5bc9e04a29e3 onap/sdnc-ueb-listener-image:latest "/opt/onap/sdnc/ue..." 2 days ago Up 2 days sdnc_ueblistener_container
2fc3b79f74d2 onap/sdnc-dmaap-listener-image:latest "/opt/onap/sdnc/dm..." 2 days ago Up 2 days sdnc_dmaaplistener_container
for SDC - would raise a JIRA but I don't see the sanity container in HEAT - I see the same 5 containers in both
HEAT
root@onap-sdc:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9622747f5df2 nexus3.onap.org:10001/openecomp/sdc-frontend:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 0.0.0.0:8181->8181/tcp, 8080/tcp, 0.0.0.0:9443->9443/tcp sdc-FE
85733ad254f7 nexus3.onap.org:10001/openecomp/sdc-backend:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 0.0.0.0:8080->8080/tcp, 0.0.0.0:8443->8443/tcp sdc-BE
5ece278fb37c nexus3.onap.org:10001/openecomp/sdc-kibana:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 0.0.0.0:5601->5601/tcp sdc-kbn
d75c2263186d nexus3.onap.org:10001/openecomp/sdc-cassandra:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 7000-7001/tcp, 0.0.0.0:9042->9042/tcp, 7199/tcp, 0.0.0.0:9160->9160/tcp sdc-cs
25d35c470325 nexus3.onap.org:10001/openecomp/sdc-elasticsearch:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp sdc-es
OOM
ubuntu@ip-172-31-82-11:~$ kubectl get pods --all-namespaces -a | grep sdc
onap-sdc sdc-be-2336519847-knfqw 2/2 Running 0 40m
onap-sdc sdc-cs-1151560586-35df3 1/1 Running 0 40m
onap-sdc sdc-es-2438522492-8cfj1 1/1 Running 0 40m
onap-sdc sdc-fe-2862673798-4fgzp 2/2 Running 0 40m
onap-sdc sdc-kb-1258596734-z4970 1/1 Running 0 40m
You did point out the disk size requirements in the video. The issue is really that AWS makes that a setting at VM create and Azure you have to separately create the data disk (or at least I couldn't find a way to do it on the original create via the portal)
BTW, thanks Brian for the review - when I started I brought up HEAT in May 2017 and enumerated all the containers to get a feel - we should have done another pass on all the vms - but without someone who would know the optional ones like in SDC we would have missed the sdc-sanity one - thanks
You can run the scripts from anywhere - I usually run as ubuntu not root - the reason the rancher script is root is because you would need to log out back in to pick up the docker user config for ubuntu.
I run either directly in /home/ubunutu or /root
The cloned directory will put oom in either of these
For ports - yes try to open everything - on AWS I run with an all open CIDR security group for ease of access - on Rackspace the VM would need individual port opennings
Yes, the multiple steps are confusing - trying to help out a 2nd team that is working using Helm 2.7 to use the tpl function - I'll remove those until they are stable
Updated wiki - thought I removed all helm 2.6/2.7 - i was keeping the instructions on aligning the server and client until we fix the vnc-portal issue under helm 2.6 - this wiki gets modified a lot as we move through all the rancher/helm/kubernetes/docker version
Hi, I'm new to ONAP and cloud computing in general, but trying to work through the above guide. I'm at the point where I'm waiting for the onap pods to come up. Most have come up, but some seem to be stuck after 2 hrs. I'm wondering if perhaps I have insufficient memory available. I'm installing on a KVM VM with 16 vCPU, 55G RAM and 220G HD.
One thought is to shutdown the VM, increase RAM to about 60G and restart, but I'm uncertain as to the pontential implications. Any suggestions as to how I could proceed would be greatly appreciated.
Unless you've taken the step to remove some components from the HELM_APPS variable in the setenv.bash script (after the oom repository was cloned), you very likely require 64 GB of RAM.
I've successfully deployed a subset of the components in a 48GB RAM VM with HELM_APPS set to this:
Thanks alot James. I have 72G on my host, but would like to leave room for additional VM's, like vFirewall. So I'll try removing some components as you suggested. Will give me an opportunity to try the clean up
Anyone who will try to install/deploy ONAP SDC container , will get an issue in SDC pod come up issue.
Exceptions:-
Recipe Compile Error in /root/chef-solo/cache/cookbooks/sdc-catalog-be/recipes/BE_2_setup_configuration
2017-12-06T11:53:48+00:00] ERROR: bash[upgrade-normatives] (sdc-normatives::upgrade_Normatives line 7) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.openecomp.sdcrests.health.rest.services.HealthCheckImpl]: Constructor threw exception; nested exception is java.lang.ExceptionInInitializerError
Correct, looks like a standard spring bean startup error -specific to SDC -which should also be failing in the HEAT deployment - I tested last night release-1.1.0 to test a merge in oom and all my pods are up except the known aaf - also the CD job is OK
this bothers me though - as I hope we are not missing something that only yourself sees - will look more into it - you are using 1.1.0 or master (master may have issues)
Also are you bringing up anything - as if you check the yaml there are dependencies
In your onap-discuss post last night - you did not have the dependent pods up - did this fix the issue - I quickly looked at the code and the HealhCheckImpl class is doing healthchecks - which would fail I would expect on dependent pods not up
Easiest way is to go the the Kubernetes UI, then under the onap-robot namespace, click on the Deployments tab, then click the three dots next to the deployment to update (in this case, robot), it will pop up a window where you can edit, among everything deployment parameters, the image version. Then click update. This will bounce the deployment (hence the pod), and will create a new deployment with the changes.
SDNC org.ops4j.pax.logging.cfg isnt the same as the file in gerrit. I noticed there is a different file in dockerdata-nfs/onap/log/sdnc that appears to come from the OOM repo instead of the CCSDK repo (same OOM file looks to be used for appc). Why isnt the SDNC logging configuration being used ?
What you're mentioning, Brian, isthe major issue we currently have in OOM: we need to fork projects' config in order to adjust to kubernetes context, whether it's for address resolution, or for logging. I'll let Michael O'Brien explained what was done for the logs. But the overall purpose wrt logging is to centralized them and have them browsable through a Kibana interface (using logstash). Regarding the address resolution, well, kubernetes provide it's own way of resolving services within namespaces, <service>.<namespace>:<internal-port>. Because of this, everywhere in the config where there is some network config we change it to levrage k8s networking.
Brian, yes there is a centralized logging configuration that has the RI in the logging-analytics repo - this ELK stack available on the onap-log kibana container internal port 5601 uses a filebeat container (all the 2/2 pods) to pipe the logs in through a set of PV's using the emptyDir directive in the yaml. A logging spec is being worked out.
Well the logging team needs to find a solution for the heavy user of the local logs where we turn on DEBUG/TRACE and generate huge amount of log entries while we step through the DG processing. The SDNC logging.cfg also creates the per DG files of data. I guess I can simply replace the file in dockerdata-nfs with the version I can use for support but it seems like we need a better solution that can fit both needs. Can't the logging.cfg support both the common onap logs and the SDNC specific DEBUG logging in the /opt/opendaylight/current/data/log directory ?
I am using release 1.1.0. It was working till Monday 4th Dec and then after that we clean up everything and redeploy the pods again to test something in my environment.
The after that SDC-be and SDC-fe never comes up. We tried this on 2-3 more setups but problem still persist.
I suspect that there is a problem in prepull_docker.sh script is not able to pull images which we currently required for SDC.
I am bringing up a clean release-1.1.0 environment to record an SDC video for another issue - so I will verify this again.
Anyway the healthcheck on the CD server is OK - the only difference is that the images are cached there right now - so on the off chance that the images were removed or not available via nexus3 - this will be seen on a clean EC2 server shortly. ( a real CD server that brings up a clean VM every time is in the works)
In master (I am also testing a patch) - I get the following (ignore aaf) in master
could be an image issue (different images in 1.1.0 and master) - or a config issue that has not been cherry picked to master yet (we are running the reverse), note portal depends on sdc - sdc is the issue
Make sure you use release-1.1.0 - as this is our stable branch right now
See separate mail on onap-discuss - we are stabilizing master - doing the last of Alexis de Talhouët cherry picks from stable release-1.1.0 - then SDC and AAI should come up
I recommend running a full set of pods in release-1.1.0 for now - you can also assist in testing master once the merges are in so we can declare it open for pending feature commits
Atul hi, thanks for the effort helping us stablilize - Alexis de Talhouët and the AAI team have fixed the 2 aai-service and aai-traversal issue that popup up 10am friday on release-1.1.0 - you can use that branch again.
Are you going to clean and rebuild release 1.1.0 for prepull_docker images?
Is there any alternative to proceed ?
I have again tried release 1.1.0 today in order to up my all ONAP components especially (AAI and SDC as well).But i am facing the same issue. My SDC component is not going to be up
There is no issue with the prepull - it is just a script that greps the docker image tags for all values.yaml - v1.1.0 in most cases.
If you run cd.sh at the top of the page - it will clean your environment and upgrade it - or checkout the commands it you want to do it yourself. There is no issue with the release-1.1.0 branch (besides a single not-required aaf container) - the
release-1.1.0 is stable as of 20171208:2300 EDT
As a check can you cover off each of the steps if you don't use the automated deploy script
(delete all pods, delete your config pod, remove dockerdata-nfs, source setenv.sh (make sure your onap-parameters.yaml is ok), create config, wait for it, (prepull is optional - it just speeds things up) , create pods, run healthcheck, PUT cloud-region to AAI ...
Remember we have not had an answer yet on your config - sdc will not come up unless dependent pods are up - for example - just try to run everything to start - then fine tune a subtree of pods later.
please try the following script - it is running on the hourly CD server and 3-4 other environments OK
Hi. now,I try to deploy onap on aws with using kubernetes. then,is it able to install onap component to separated VM? for example, aaf's one pod install to a 64gvm, then install another aaf's pod to 32g VM.
and another question,namespace in kubernetes equall VM in HEAT? like aaf vm,aai vm..in diagram.
Yes it is possible to run as many hosts as you like - this is the recommendation for a scalable/resilient system - there is a link to the SDNC initiative above - essentially you need to share the /dockerdata-nfs directory.
For your question about affinity - yes you can assign pods to a specific host - but kubernetes will distribute the load automatically and handle any failures for you - but if you want to change this you can edit the yaml either on the checked out repo - or live in the Kubernetes console.
There is the global namespace example "onap" then the pod/component namespace "aai, aaf" - they combine as onap-aai - so the closest the HEAT VM model would be to equate the pod namespace - however a pod like onap-aai could have HA containers where individual containers like aai-resources have 2 copies split across hosts - also parts of a pod could be split like aai-resources on one host and aai-service on another. the global namespace allows you to bring up several deployments of ONAP on the same kubernetes cluster - separated by namespace prefix and port assignment (300xx, 310xxx for example)
I have installed ONAP on Kubernetes on a single host machine following the manual instructions
Now I am trying to run the vFW demo in my setup. I am facing an error when I am onboarding the vFW-vSINK VSP using the SDC portal. The error occurs during the asset creation process after the VSP is imported into the catalog. Here is the error, also attaching the screenshot
Error code SVC4614
Status code 400
invalid content Group type org.openecomp.groups.heat.HeatStack does not exist
To give a back ground of the processes followed:
I installed Kubernetes and Rancher. Kubernetes environment was created using Rancher portal and it showed healthy state.
onap_parameter.yaml file was edited according to my OpenStack setup running on a separate host.
Thanks for the information. Yes I am using release-1.1.0. In fact I re-created the PODS once again and the error got resolved. Now I have reached to a stage where I am able to create and distribute the vFW-vSINK services.
Alan, Hi, there are a couple components that fail healthcheck for up to 15 min after the readiness pod marks them as up - the liveness probe needs to be adjusted and the teams need to provide a better /healthcheck url
SDC healthchecks fail constantly. Even in the CI build history there is a failure in every build output I checked. Also this graph shows different results now:
Are you able to resolve the above usecaseui-gui api health check issue. Since i am facing the same issue , it would be great if you have any workaround on this issue
No use usecaseui-gui still fails even in the jenkins: http://jenkins.onap.info/job/oom-cd/2123/console. I have not reached to the point where I will need these failing services, maybe for most of the use cases they are not needed at all.
I was able to create/deploy the vFirewall package (packet generator, sinc and firewall vnf)on openstack cloud. But i couldnt able to login into any of vnf's vm.
After when i debug i see i didnt change the default public key with our local public key pair in the PACKET GENERATOR curl jason UI. Now i am deploying the VNF again (same Vfirewall Package) on the openstack cloud, thought of giving our local public key in both pg and sinc json api's.
I have queries for clarifications : - how can we create a VNF package manually/dynamically using SDC component (so that we have leverage of get into the VNF vm and access the capability of the same) - And I want to implement the Service Function chaining for the deployed Vfirewall, please do let me know how to proceed with that.
PS: I have installed/Deployed ONAP using rancher on kubernetes (on openstack cloud platform) without DACE component so i haven't had leverage of using the Closed Loop Automation.
Could you please let me know the significance of the CURL command as mentioned in the cd.sh ( the automated script )
The CURL query present in cd.sh ( the automated script to install ONAP pods ) is failing. It has three parameters :
1. json file ( not sure whether we are supposed to use the same file as specified by ONAP community or we need to fill in our openstack details ). I have tried both. 2. a certification file named aaiapisimpledemoopenecomporg_20171003.crt ( which has NOT been attached alongwith the cd.sh script or specified anywhere else ) 3. There is a änother header ( -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" ). If I use this header, the script is faling. I have removed this header, then PUT succeed but GET fails.
I am NOT sure of the significance of the below mentioned curl command in cd.sh file. I was just doing the vfirewall onboarding, that time I noticed that this CURL command is required.
Moreover, the robot scripts ( both ./demo-k8s.sh init_robot and ./demo-k8s.sh init ) are failing.
The init_robot is failing : though we have entered the test as password but the http is not taking it.
The init testcase is failing giving me 401 error for the authorization.
Could you please help! Thanks in advance!
cd.sh snippet :
echo "run partial vFW" echo "curl with aai cert to cloud-region PUT"
Hi, the curls are an AAI POST and GET on the cloud region - this is required as part of testing the vFW. For yourself it is optional until you need to test some use case like the vFirewall.
If your init is failing then your cloud region and tenant are not set - check that you can read them in postman before running robot init (init_robot is only so you can see failures on the included web server - this should pass)
Thank you so much for the instant response. Glad to notice that all the queries have been addressed. But, still I am facing some errors:
I have tried running those CURL queries ( which I have pasted above ) by putting the complete set of our openstack values. Below is the list for the same.
Since it has been redirect, just to use our openstack TENANT ID and let other other values be same. The CURL GET still shows the error. Could you please help! BR,
When we added the resource-version ("resource-version":"1513077767531",) in the aai json file apart from tenant ID, then the CURL command was successful. We fetched the resource version using the CURL GET command.
But, I am sure that every person, needs to fill their OWN OPENSTACK details ( rather than using the default details as mentioned in the AAI json file ).
Reason being the init robot is still failing. And if the robot testcase has to pick our openstack details via onap-parameters.yaml file ( rather than the one's specified as defaults in the json file shared ) , then definitely in AAI json file, we should pass our openstack details only. Please advise!
2. Also, I think we need to create a separate region like ( RegionThree) etc with our system openstack details , to make new entries in AAI.
2. Also, as discussed, I have checked the integration robot file used by ONAP-robot, the AAI username and password was as mentioned below:
3. I can notice that AAI logs are not getting updated , when we are running these CURL queries that enter data into AAI. Could you please let me know how to enable AAI logs?
The last update I could notice is of 12th dec in my system for AAI logs. But, from past few days , we are constantly trying to run CURL queries to enter data into AAI.
I have logged in to the AAI-SERVICES container but no AAI logs can be seen. Screenshot attached for your reference.
4. Moreover, aai-services is not present in dockerdata-nfs folder. Not sure why? Other sub-modules are present though.
Hi, We appreciate your exercising of the system. You likely have run into a couple issues we currently have with SDC healthcheck and Kubernetes liveness in general. Please continue to raise any jiras on issues you encounter bringing up and running ONAP in general. SDC is currently the component with the least accurate healthcheck in Kubernetes or Heat.
Currently SDC passes healthcheck about 74% of the time - if we wait about 8 min after the readiness probe declares all the containers as ready 1/1. The issue with SDC (26%), SDNC(8%), APPC (1%) in general is that their exposed healthcheck urls do not always report the system up at the appropriate time.
The workaround is to delay healthcheck for now until the containers have run for a bit - 5-10 min - which is a normal warming of the system and caches in a production system.
On the CD system, SDC comes up eventually 2/3 of the time - our issue is helping OOM and the component teams adjust the healthcheck endpoints to report proper liveness (not just 200 or a subset of rest functionality) - You both are welcome to help us with these and any other of our outstanding issues - we are expanding the team.
OOM SDC healthcheck failure 26% of the time even with 3 runs and 8 min wait state
In my case, the SDC never passed health checks even after waiting a couple of hours after everything is "Running" in kubectl. They passed health checks only after I restarted SDC. Which JIRA issue do you think this info is applicable to?
Gary Wu: For me, restarting SDC helped fix the Health-check. However when launching SDC UI, it failed to open (even though Health check was now passing).
For SDC-UI to work:
I had to restart ONAP (./deleteAll.bash-n onap; ./createAll.bash-n onap)
Made sure that SDC health check works after ONAP restart (with wait time ~ 10 min after containers start).
For this, I had to fix /etc/hosts in vnc-portal to change the SDC IP addresses since they change once you restart SDC.
However, I think I'm going to just re-deploy the entire ONAP until SDC passes the health check since I don't know what other things become out-of-date if SDC is restarted on by itself.
I also met the same SDC problem after deployed ONAP. The health check still did not pass even I restart sdc(./deleteAll.bash -n onap -a sdc and ./createAll.bash -n onap -a sdc) for 10 minutes. It seems all SDC components were running up except TITAN. I checked the log in container sdc-be: /var/lib/jetty/logs/SDC/SDC-BE/error.log.3, found Tian graph failed to initialize with an execption thrown com.thinkaurelius.titan.core.TitanException. Any sugguestion about this why Tian can not work?
{
"sdcVersion": "1.1.0",
"siteMode": "unknown",
"componentsInfo": [
{
"healthCheckComponent": "BE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
},
{
"healthCheckComponent": "TITAN",
"healthCheckStatus": "DOWN",
"description": "Titan graph is down"
},
{
"healthCheckComponent": "DE",
"healthCheckStatus": "UP",
"description": "OK"
},
{
"healthCheckComponent": "CASSANDRA",
"healthCheckStatus": "UP",
"description": "OK"
},
{
"healthCheckComponent": "ON_BOARDING",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK",
"componentsInfo": [
{
"healthCheckComponent": "ZU",
"healthCheckStatus": "UP",
"version": "0.2.0",
"description": "OK"
},
{
"healthCheckComponent": "BE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
},
{
"healthCheckComponent": "CAS",
"healthCheckStatus": "UP",
"version": "2.1.17",
"description": "OK"
},
{
"healthCheckComponent": "FE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
}
]
},
{
"healthCheckComponent": "FE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
}
]
2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<** createGraph started **> 2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<** open graph with /var/lib/jetty/config/catalog-be/titan.properties started> 2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<openGraph : try to load file /var/lib/jetty/config/catalog-be/titan.properties> 2018-01-08T09:59:10.719Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool> 2018-01-08T09:59:10.726Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc> 2018-01-08T09:59:15.580Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool> 2018-01-08T09:59:15.581Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc> 2018-01-08T09:59:16.467Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: 10.42.243.240> 2018-01-08T09:59:16.468Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<RemoveHost: sdc-cs.onap-sdc> 2018-01-08T09:59:23.938Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.t.g.c.GraphDatabaseConfiguration||ActivityType=<?>, Desc=<Set default timestamp provider MICRO> 2018-01-08T09:59:23.946Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.t.g.c.GraphDatabaseConfiguration||ActivityType=<?>, Desc=<Generated unique-instance-id=0a2a0d4d395-sdc-be-1187942207-21tfw1> 2018-01-08T09:59:23.956Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool> 2018-01-08T09:59:23.956Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc> 2018-01-08T09:59:24.052Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool> 2018-01-08T09:59:24.052Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc> 2018-01-08T09:59:24.153Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: 10.42.243.240> 2018-01-08T09:59:24.153Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<RemoveHost: sdc-cs.onap-sdc> 2018-01-08T09:59:24.164Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.titan.diskstorage.Backend||ActivityType=<?>, Desc=<Initiated backend operations thread pool of size 96> 2018-01-08T09:59:34.186Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<createGraph : failed to open Titan graph with configuration file: /var/lib/jetty/config/catalog-be/titan.properties> com.thinkaurelius.titan.core.TitanException: Could not initialize backend at com.thinkaurelius.titan.diskstorage.Backend.initialize(Backend.java:301) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1806) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:123) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:62) ~[titan-core-1.0.0.jar:na] at org.openecomp.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:256) [catalog-dao-1.1.0.jar:na] at org.openecomp.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:207) [catalog-dao-1.1.0.jar:na] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_141] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_141] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141] at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141] at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:366) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:311) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:134) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:408) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1575) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:207) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1131) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1059) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:835) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:741) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:467) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1128) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1022) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:512) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
From what I have seen so far, health check seems to succeed immediately after containers are ready provided the worker node has enough CPU/Memory. In my case, the worker node had 48 vCPUs and 64GB RAM.
Syed Atif Husain: For PortalApps, looks like your system was unable to pull the image. One way to work around is to manually pull the image and also change the pullPolicy from Always to IfNotPresent (under $OOM_HOME/kubernetes/portal/values.yaml - see here).
For vnc-portal, the Pod would stay in 'PodInitializing' until the portalapps starts up, as it's defined as init-container dependency for vnc-portal (see here).
I needed to restart the sdnc dgbuilder container after loading DGs via the mulitple_dgload.sh and k8 started a new instance before I could do a docker start. What is the mechanism to restart a container to pick up a change made on persistant storage for the container ?
It's exactly a docker rm. With K8S you never stop start a container, you rm and re-create it (this is done automatically by K8S when a pod is deleted). So if the changed data is persisted, then it's ok to delete the pod, hence delete the container, because the new one will pick up the new data.
K8S deployment manifest defines the contract for the pod, which in the end is the container. Deleting the pod does delete the container, and kubernetes, based on the deployment manifest, will re-create it. Hope it clarifies things.
It does clarify things but we will have to make sure the things we did in Docker like edit a file inside the container and do a stop/start or restart can be done in K8. This is actually a problem in debugging where the project teams will have to make changes to support debugging in K8. We had setup shared data in the container configuration so that we can edit values and then delete the pod to pick up the new values. This will be a tedious pain.
At the end of the day, a docker stop docker start is just a lazy way to restart process(es) running within the container. If the proccess(es) to restart are not tied to the docker liveliness (e.g PID 1), then instead of stopping and starting the container, we could simply stop and start the process within the container. I'm not too scared about this being a pain to debug, but we will see I doubt I'm familliar enough with all of them (knowing they are around 80 containers as of today for the whole ONAP).
I think we need to add a volume link (-v in docker) for each app that we might need to modify configuration and do a restart - dgbuilder for instance has a script to bulk load DG's into the flows.json file but this file would be lost whenever the dgbuilder/node-red pod is restarted right now. This would not happen in regular docker on a stop/start or restart.
We need take a running instance of ONAP using OOM and change each application in some normal way and then restart to confirm that on a restart we aren't losing data. This is something we did in the HEAT/Docker/DockerCompose environment to make sure all the persistant storage settings were correct. Since k8 does a recreate instead of a restart we may lose file based configuration data. I would look a : add vFW netconf mount to APPC, add a flow to DG builder, create and distribute a model, instantiate a vFW , execute a closed loop policy on the vFW and vDNS ; then restart all containers and confirm that the data created is still there and the same control loops still run. I suspect right now with an OOM installation that parts might not survive a docker stop and K8 re-create of the container (since we cant do a docker start)
I'm new to Kubernates and to OOM but so the following question could have a obvious answer that I've completely missed.
Is there a reason not to use the following commands to expose the K8s containers so that you don't have to log on via the VNC sever which is just a pain.
Good question, I guess we live with port mapping requiring the vnc-portal so we can run multiple environments on the same host each with 30xxx, 31xxx etc.. but in reality most of us by default run one set of ONAP containers. Myself when I work in postman I use the 30xxx ports except for using the SDC gui - in the vnc-portal.
I think we need a JIRA to run ONAP in affective single port mapping config where 8989 for example maps to 8989 outside the namespace and not 30211 - for ease of development.
as a directory that is mapped from the host file system so that updates to the flows.json file in /opt/onap/sdnc/dgbuilder/releases/sndc1.0/flows/flows.json would persist across restarts/recreates of the container ?
alternatively is there a way to temporarily set the restart policy to never so that we can manually update flows.json and then restart the existing container ?
The name here has the be the same as the one specified above, it serves as ID to correlated the mounted folder.
The hostpath implies here that you have created on the host the folder /dockerdata-nfs/{{ .Values.nsPrefix }}/sdnc/dgbuilder/releases (where {{ .Values.nsPrefix }} is onap) and put the data you whish to persit in there.
caused a redeployment but dgbuilder didn't like the hostPath since files it was expecting aren't on the host until the dgbuilder image is pulled. Not sure if its a permissions problem on the host directories.
Should we be using something more like EmptyDir{} (but that doesn't seem to take a path) ?
Brian, I forget to mentioned the data has to be put in the persisted directory in the host first. Mounting the host directory will overwrite the directory in the container. So the first time, all the data is in the persisted directory (in the host). Then you start the pod, the persisted data will be mounted in the container. From there, you can either edit the persisted data from the server or from the pod itself.
Hi again, Very Good idea. A lot of the applications need a way to either expose config (log, db config) into the container or push data out (logs) to a NFS mapped share on the host. My current in-progress understanding of Kubernetes is that it wraps docker very closely and adds on top of docker where appropriate. Many of the docker commands exec, log, cp are the same as we have seen. For static persistent volumes there are already some defined in the yamls using volumeMounts: and volumes:. We also have dynamic volumes (specific to the undercloud VIM) in the SDNC clustering poc - https://gerrit.onap.org/r/#/c/25467/23. We still need places where volume mounts can be done to the same directory that already has an emptyDir stream into Filebeat (which has a volume under the covers) - see
For example the following has a patch that exposes a dir into the container just like a docker volume or a volume in docker-compose - the issue here is mixing emptyDir (exposing dirs between containers) and exposing dirs outside to the FS/NFS
I have used these existing volumes that expose the logback.xml file for example to move files into a container like the MSO app server in kubernetes from /dockerdata-nfs instead of using kubectl cp.
I myself will also look into PV's to replace the mounts in the ELK stack for the CD job - that is being migrated from docker-compose to Kubernetes and for the logging RI containers.
For the question about whether we can hold off on container restarts to be able to manually update a json exposed into the container. The model of Kubernetes auto-scaling is stateless. When I push pods without affinity rules - the containers randomly get assigned to any host and bringing down a container either manually or because of a health initiated trigger is usually out of the control of any OSS outside of Kubernetes - but there are callbacks. Rancher and Kubeadm for example are northbound to Kubernetes and act as VIM's and in the same way that a spot VM going down in EC2 gives a 2 min warning - I would expect we could register as listener to to at least a pre-stop of a container - even though it is a second or 2. I would also like to verify this and document all of this on our K8S devops page - all good questions that we need definitely need an answer for.
I was getting an error message since "-y" wasnt an allowed argument. Is cd.sh checked into onap.gerrit.org somewhere so we can reference that instead of the copy on the wiki ? Maybe I'm just looking in the wrong spot.
Brian, hi, you are using amsterdam - the change done by Munir has not been ported from master.
I retrofitted the CD script to fix the jenkins job and patched github to align with the new default prompt behaviour of deleteAll
yes, ideally all the scripts northbound of deleteAll should be in onap - I will move the cd.sh script into a ci/cd folder in OOM or in demo - as it clones oom inside.
Also, I'll put in an if statement on the delete special to amsterdam to not require the -y option
Actually I think this will be an issue for anyone master/amsterdam that has cloned before OOM-528 - essentially we need a migration plan
In my case I brought up an older image of master before the change - and the cd.sh script with the -y option fails (because it is not resilient ) on -y
root@ip-172-31-48-173:~# ./cd.sh -b master
Thu Jan 11 13:48:59 UTC 2018
provide onap-parameters.yaml and aai-cloud-region-put.json
vm.max_map_count = 262144
remove existing oom
Usage: oom/kubernetes/oneclick/deleteAll.bash [PARAMs]
-u : Display usage
-n [NAMESPACE] : Kubernetes namespace (required)
-a [APP] : Specify a specific ONAP component (default: all)
from the following choices:
sdc, aai ,mso, message-router, robot, vid, aaf, uui
sdnc, portal, policy, appc, multicloud, clamp, consul, vnfsdk
-N : Do not wait for deletion of namespace and its objects
Therefore unfortunately anyone on an older branch either needs to do a git pull or edit cd.sh one-time to remove the -y - after that you are ok and effectively upgraded to
OOM-528
-
Getting issue details...STATUS
I will add a migration line to the last onap-discuss on this
I am new to ONAP and yesterday I did setup ONAP on a permanent AWS m4 large instance which uses Dynamic public IP. Today, I removed existing ONAP environment and recreated new environment in Rancher. After adding the environment when I am trying to add host, rancher is not detecting new public IP. In the register command rancher is still referring to yesterday's public IP which is not valid.
Please let me know the steps required to restart ONAP on a Dynamic IP based server which needs to be shutdown and restarted on daily basis.
Hi, that is a common issue with Rancher - it needs a static IP or DNS name.
You have a couple workarounds, elastic IP, elastic IP + domain name, edit the host registration URL in rancher, or docker stop/rm rancher and rerun it
I opt for elastic IP + DNS entry - in my case I register onap.info in Route53, create an EIP in the EC2 console, then associate the EIP with the labelled instance ID network ID before bringing up rancher/kubernetes/helm.
This will also allow you to save the AMI and bring it up later with a 20 min delay until it is fully functional - provided you keep the EIP and domain A record.
this how the CD system works - see the following but do not touch anything it is used for deployment testing for the first 57 min of the hour. http://amsterdam.onap.info:8880/
Sorry I was answering your first question from memory this morning - didn't realize you added a 2nd comment with your workaround - yes that is OK but we agree - a lot of work. What you may do - and I will try is a very small static IP for the host a 4G machine that does not run the ONAP pods - they will all have affinity to a 2nd 64G host that has a dynamic IP - but the server must be static.
Another workaround that I have not tried is a automated host authentication via REST or CLI - this I need to research.
But still the easier way is to bring up the EC2 VM with an EIP (it will cost $2 per month when not used though) - You should have an allocation of 5 on your AWS account - I asked for 10.
We ran prepull_docker.sh on 4 different k8s nodes at the same time, we got 75,78,80 and 81 images (docker images | wc -l), we verified the pulling process using (ps -ef | grep docker | grep pull), all pulling processes were completed. Do you know why we got different number images?
Yes, weird - intermittent errors usually mean the underlying cloud provider, I sometimes get pull errors and even timeouts - used to get them on heat as well. There are issues with nexus3 servers periodically due to load, upgrades and I have heard about a serious regional issue with mirrors. I do not know the cloud provider that these servers run on - the issue may be there. The script is pretty simple - it greps all the values.yaml files for docker names and images - there were issues where it parsed incorrectly and tried to pull just the image name or just the image version - but these were fixed - hopefully no more issues with the sh script.
There also may be issues with docker itself with 80 parallel pulls - we likely should add a -serial flag - to pull in sequence - it would be less performant.
you can do the following on a clean system to see the parallel pulls in progress and/or count them
ps -ef | grep docker | grep pull | wc -l
In the end there should be no issues because anything not pulled in the prepull will just get pulled when the docker containers are run via kubectl - they will just start slower the first time.
please note that there are a couple "huge" images on the order of 1-2G one of them for SDNC - and i have lately seen issues bringing up SDNC on a clean system - required a ./deleteAll.bash -n onap -a sdnc and re ./createAll.
Another possibility is that docker is optimizing or rearranging the pulls and running into issues depending on the order.
Another issue is that the 4 different servers have different image sets - as the docker images | wc -l may be picking up server or client images only present on one or more of the nodes - if you look at a cluster of 4 servers - I have one - then the master has a lot more images than the other 4 and the other 3 clients usually run different combinations of the 6 kubernetes servers - for what reason I am still looking at - before you even bring up the onap containers.
lets watch this - there is enough writing here to raise a JIRA - which I will likely do.
Michael O'Brien - I am trying to bring up vid, robot, and aai w/ the latest oom, seeing this error on several aai pods:
Error: failed to start container "filebeat-onap-aai-resources": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/log/filebeat/logback/filebeat.yml\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/2234aef661aa61185f7fb8fd694ec59d29f82c2478d9de1beee0a282e4af4936\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/2234aef661aa61185f7fb8fd694ec59d29f82c2478d9de1beee0a282e4af4936/usr/share/filebeat/filebeat.yml\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
The config job seems to have failed with an error but it did create the files under /dockerdata-nfs/onap
Hi, good question and thank you for all your help with OOM code/config/reviews.
That particular error "not a directory" is a sort of red herring - it means 2 things, the container is not finished initializing (the PVs and volume mounts are not ready yet - it will go away after the pod tree is stable - or your config pod had an issue - not recoverable without a delete/purge. These errors occur on all pods for a while until the hierarchy of dependent pods are up and each one goes through the init cycle - however if you see these after the normal 7-15 min startup time and they do not pass config - then you likely have an issue with the config pod pushing all the /dockerdata-nfs files (this is being removed and refactored as we speak) - due to missing config in setenv.bash and onap-parameters.yaml (it must be copied to oom/kubernetes/config)
Also that many failures usually means a config pod issue - or a full HD or RAM issue (if you have over 80G HD (you need 100G over time) and you have over 51G ram - then it is a config pod issue.
How to avoid this. See the cd.sh script attached and linked to at the top of the page - this is used to provision a system automatically on the CD servers we run the hourly jenkins job on - the script can also be used by developers wishing a full refresh of their environment (delete, re-pull, config up, pods up, run healthcheck...)
If you are running the system manually - use the cd.sh script or the manual instructions at the top in detail - the usual config issue is forgetting to configure onap-parameters.yaml (you will know this by checking the config pod status). The second usual issue is failing to run setenv.sh to pickup the docker and other env variables - this will also fail the config container.
kubectl get pods --all-namespaces -a
it must say
onap config 0/1 Completed 0 1m
do the following to see any errors - usually a missing $variable set
kubectl -namespace onap logs -f config
as of an hour ago these were the failing components - no AAI, vid or robot
As an additional reference you can refer to the running master CD job - for the times when you might think it is actually failing - not just locally.
00:08:17Basic A&AI Health Check | PASS |
00:08:17------------------------------------------------------------------------------
00:08:18Basic VID Health Check | PASS |
Also AAI has not been failing healthcheck for at least the last 7 days - actually I think since the first week of Dec 2017 - once - it is one of the most stable ONAP components
Let me know if this fixes your issues - if your config pod is busted - then you will need to deleteAll pods, purge the config pod and rerun setenv, config pod and createAll - see the script for the exact details
Thanks Michael O'Brien, I needed to refresh the config pod and once i got "completed" I was able to get aai and several others going! Thanks for your help!
This is a pretty basic question. I've been having some trouble with getting SDNC running (still troubleshooting) but as then looking at the readiness docker image and understanding how it worked.
I think I understood most of it but I couldn't figure out how the value of "K8S_CONFIG_B64" environment variable was been set as the seems to be some "magic" for this and I was hoping somebody could give me a hint.
Andrew, hi, just to cover off SDNC - since clustering was put in - the images have increased in number and size - there may be a timeout issue. So on a completely clean VM you may need to delete and create -a sdnc to get around this issue that only appears on slow machines (those with less than 16 cores)
Last December (2017) I managed to deploy an almost-amsterdam version of ONAP using oom on a single Ubuntu VM. I used a manual list of commands (cd.sh was not available at the time) as explained on this page. The installation used:
Docker 1.12, Rancher server 1.6.10, Kubernetes 1.8.6, Helm 2.3.0
Most container came up. Over time (weeks) things degraded.
Back from the holidays I tried to reinstall (this time I'm aiming for the amsterdam branch) from scratch and had issue with Rancher.
To remove the possibility that my host was corrupted in some way, today I used a brand new Ubuntu 16.04.4 VM I tried to create the same environment for ONAP. I executed the commands in oom_rancher_setup_1.sh. I executed these by hand so that I can better control the docker installation and the usermod command.
I ended up with the same problem I had on my old VM, yesterday.
The problem is has follow: In the Rancher Environment GUI I created a Kubernetes environment. Once I made it the default the State became "Unhealthy". Rancher won't tell you why!
Then I tried anyway to add a host. When running the command:
The agent started to complain that it could not connect to the server. SSL certification is failing.
I get an output like this:
Unable to find image 'rancher/agent:v1.2.6' locally
v1.2.6: Pulling from rancher/agent
b3e1c725a85f: Pull complete
6a710864a9fc: Pull complete
d0ac3b234321: Pull complete
87f567b5cf58: Pull complete
063e24b217c4: Pull complete
d0a3f58caef0: Pull complete
16914729cfd3: Pull complete
2ce3828c0b9d: Pull complete
7df47a98fc4b: Pull complete
Digest: sha256:a68afd351c7417e6d66a77e97144113ceb7a9c3cdd46fb6e1fd5f5a5a33111cd
Status: Downloaded newer image for rancher/agent:v1.2.6
INFO: Running Agent Registration Process, CATTLE_URL=https://10.182.40.40:8880/v1
INFO: Attempting to connect to: https://10.182.40.40:8880/v1
ERROR: https://10.182.40.40:8880/v1 is not accessible (server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none)
ERROR: https://10.182.40.40:8880/v1 is not accessible (server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none)
ERROR: https://10.182.40.40:8880/v1 is not accessible (server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none)
ERROR: https://10.182.40.40:8880/v1 is not accessible (server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none)
^C
The Unhealthy state might be due to the web client having the same communication issue.
This does not appear to be an ONAP specific issue, since I'm failing in one of the first installation step which is to get a Rancher server and agent working together.
This behavior was only observed upon my return on January 9th. In December I had no such issue.
Could a certificate be expired? Where are these certificates? (In the docker images I suspect)
Hi, welcome. Also very detailed and complete environment description - appreciated.
I am extremely busy still - but your post stood out. I will return in more detail on the weekend.
For now, yes I also have had issues connecting the client - usually this involved a non static IP. for example if I saved an AMI on AWS and got a different EIP. There are several fixes for that one - use a static EIP and/or assign a domain name to it. Also you can retrofit your server - I turned off security on the CD poc for a couple days
Update: I reproduced the same SSL issues using a small vagrant VM (2 CPU, 2GB). The VagrantFile uses: config.vm.box = "ubuntu/xenial64"
From this VM I ran the following commands:
sudo curl https://releases.rancher.com/install-docker/1.12.sh | sh
sudo docker run -d --restart=unless-stopped -p 8880:8080 rancher/server:v1.6.10
# From the Rancher web-ui activated a Kubernetes environment
# then got (and exec) the following command to add a host
sudo docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.6 https://192.168.16.61:8880/v1/scripts/0D95310D5AF5AC047A37:1514678400000:f4CQEfzqgONjYc3vZlq6K9MbTA
I also tried rancher server v1.6.11. Same issues were seen.
Could you get through the issue? I have also manually installed the components, but unable to get ONAP up running. It would be helpful if you can list the steps taken to install and run onap.
I could post my notes. They would look like a summary of information already on this page.
If some think it would be useful, I could do so.
In order to avoid too much redundancy on this page, could you tell us a bit more about where you have issues. Then maybe I could post a subset of my notes around this area.
Basically I see this installation being made of 2 major steps:
Install infrastructure: Docker 1.12, Rancher server 1.6.10, Kubernetes 1.8.6, Helm 2.3.0. After this step you should be able to go to the Rancher Web UI and see the rancher/kubernetes dockers instances and pod running. This means running the oom_rancher_setup_1.sh, which in my case I ran manually. Followed by some interaction in Rancher's web UI to create a k8s env, and add a host.
You can find more supporting debugging for the same SDNC SLI-API in the attached document.
After running the installSdncDb.sh script , and after logging into the SDNC container and after logging into the SDNC database, we found that the "VLAN_ID_POOL" table does not exists, though the database was showing that the mentioned table exists. It was present in stale format.
Rahul Sharma I followed the steps on the link above but I am facing issues related to connectivity to Openstack. I guess I am missing some basic setup in my openstack.
I have created a network and subnet on openstack. I am using there ids in the param file for OPENSTACK_OAM_NETWORK_ID and OPENSTACK_OAM_SUBNET_ID respectively. What should I use for OPENSTACK_PUBLIC_NET_ID? Do I have to create another network? How do I ensure my ONAP VM is able to connect to the Openstack VM? [I have installed ONAP OOM on one Azure VM and Openstack on another VM].
Syed Atif Husain: OPENSTACK_PUBLIC_NET_ID should be one of the networks on your Openstack that's publicly accessible. One of the public IP assigned to your vFW_x_VNF (x = SINC or PG) would belong to this network.
You don't need to create other networks: unprotected_private_net_id (zdfw1fwl01_unprotected), unprotected_private_subnet_id(zdfw1fwl01_unprotected_sub), protected_private_net_id(zdfw1fwl01_protected), protected_private_subnet_id(zdfw1fwl01_protected_sub) would be created as part of vFW_SINC stack deployment.
The "pub_key" attribute will be used to communicate with the VM on Openstack.
Note: the values sent in the SDNC-Preload step are used to create the stack; so if you want to update something, you can do it then.
Also, when I tested, my ONAP was running on Openstack; running ONAP on Azure should be similar considering that MultiVIM should take care of different platforms underneath but you can verify in that area. Have a look at the VF instantiation flow for Release 1.1 here
When I run cd.sh, the config pod isnt coming up. It's shown to be in error state. Does anyone know why this happens? In the kubectl logs, I see the following error 'DEPLOY_DCAE" must be set in onap-parameters.yaml.
You need to give dcae related params in onap-paramters.yaml file. Otherwise remove dcae component from HELM_APPS in oom/kubernetes/oneclick/setenv.bash if you dont want to install dcae or if your openstack setup is not ready
Refer manual instructions under the section 'quickstart installation'
I won't have time until later today to check - but if the config container complains about a missing DCAE variable - then there is a chance the config yaml is missing it
-----Original Message----- From: Michael O'Brien Sent: Tuesday, January 23, 2018 07:04 To: 'Pavan Gupta' <pavan.gupta@calsoftinc.com> Subject: RE: Issues with cd.sh sciprt
Pavan,
Hi, the script mirrors the manual instructions and runs ok on several servers including the automated CD server.
You place the 2 aai files, the onap-configuration.yaml file beside the cd.sh script and run it (this assumes you have run the rancher config ok)
I would need the error conditions pasted to determine if you missed a step - likely during the config pod bootstrap - could you post the errors on the config pod you see.
Also verify all versions and prerequisites, Rancher 1.6.10, helm 2.3.x, docker 1.12.x, Kubernetes 1.8.x
Try to come to the OOM meeting and/or raise a JIRA and we can look at it from there.
DCAE is in flux but there should be no issues with the 2.0.0 tag for the config container
I have posted this query on the wiki page as well. I could get the installation script working and moved onto running cd.sh. Config pod is shown in error state. I looked at Kubenetes log and it says DEPLOY_DCAE should be set in snap-parameters.yaml file. I tried setting this parameter, but the error still continues. Any idea, what’s going wrong or needs to be done to resolve this issue?
I have setup onap via OOM via Rancher on VMware Workstation 14 and VMware Fusion 8 with no issues
The config in onap-parameters.yaml must point to an openstack user/pass/tenant so that you can create a customer/tenant/region in AAI as part of the vFW use case. You can use any openstack or Rackspace config - you only need keystone to work until you get to SO instantiation.
In the future we will be able to configure Azure or AWS credentials via work being done in the Multicloud repo.
Hi, I got to the point of getting a VNF deployed using the kubernates deployment so just wanted to let you know it can work in different environments.
I'm using Rancher and a host VM on a private Red Hat OpenStack.
a couple of local workarounds but and I had to redeploy AAI as it didn't come up first time.
However SDNC didn't work and I had to change it from using the NFS server to using the kubernates volumes as I was getting the error in the nfs-provisioner-.... pod refering to all the ports but I think I have them all open etc.
Why is volume handling for SDNC different to the other namespaces ?
Volume handling for SDNC is done differently for 2 reasons:
To support scaling SDNC (ODL-MDSAL and MySQL): For dynamically creating persistent-volumes when scaling MySQL pods, we need Storage classes. And to support Kubernetes deployed on local VMs, 'nfs' based provisioner was one of the available option.
To make sure that volumes are persisted after Pod restart; hence cannot use pod's empty directory.
Not sure why nfs-provisioner isn't starting for you when you have the ports open?
We created a service and distributed using SDC UI. As per the SDC video, the service should be distributed in AAI, VID and MSO. List:
Vendor name : MyVendor
License agreement : MyLicenseAgreement
Entitlementpool : MyEntitlementPool
Service : vFW-vSINK-service
VSP : vFW-vSINK
2. After running the init robot testcase, we can notice that only the default services are being listed. The service , which we created using SDC, is not visible in AAI.
3. The curl queries for SDC are not working. We tried many curl queries for the same, to fetch the service name/instance.
Pavan, Hi, that ubuntu 14 version is a left over from the original heat parameters - it was used to spin up VM's (the original 1.0 heat install had a mix of 14/16 VMs - don't know why we don't also list the 16 version - you can ignore it as we are only using docker containers in Kubernetes right now.
After the installation, I tried http://10.22.4.112:30211 on the browser and the ONAP portal didn't open up. Not all services are shown 1/1 (please check the output below)
I am not sure, why can't I see the onap portal now.
FOllowing is the error msg on Kubernetes. Its not able to pull the container image.
Failed to pull image "nexus3.onap.org:10001/onap/vfc/ztevnfmdriver:v1.0.2": rpc error: code = 2 desc = Error: image onap/vfc/ztevnfmdriver:v1.0.2 not found Error syncing pod
Check for oom/kubernetes/portal/values.yaml file in the respective ONAP component ( sayvfc or portal or MSO etc ) and look for the prepull policy option.
1/25/2018 11:06:59 AM2018-01-25 19:06:59.777976 I | Using https://kubernetes.default.svc.cluster.local:443 for kubernetes master 1/25/2018 11:06:59 AM2018-01-25 19:06:59.805097 I | Could not connect to Kube Masterthe server has asked for the client to provide credentials
Has anyone seen this issue or know how to solve it?
I guess this would be on Amsterdam. You need to update the kube2msb deployment file with your K8S token. In Rancher, under your environment, go in Kubernetes → CLI → Generate Config this should gives you your token to authenticate to K8S API for your deployment.
stored passwd in file: /.password2 /usr/lib/python2.7/dist-packages/supervisor/options.py:297: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security. 'Supervisord is running as root and it is searching ' 2018-01-25 21:47:52,310 CRIT Supervisor running as root (no user in config file) 2018-01-25 21:47:52,310 WARN Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing 2018-01-25 21:47:52,354 INFO RPC interface 'supervisor' initialized 2018-01-25 21:47:52,357 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2018-01-25 21:47:52,357 INFO supervisord started with pid 44 2018-01-25 21:47:53,361 INFO spawned: 'xvfb' with pid 51 2018-01-25 21:47:53,363 INFO spawned: 'pcmanfm' with pid 52 2018-01-25 21:47:53,365 INFO spawned: 'lxpanel' with pid 53 2018-01-25 21:47:53,368 INFO spawned: 'lxsession' with pid 54 2018-01-25 21:47:53,371 INFO spawned: 'x11vnc' with pid 55 2018-01-25 21:47:53,373 INFO spawned: 'novnc' with pid 56 2018-01-25 21:47:53,406 INFO exited: x11vnc (exit status 1; not expected) 2018-01-25 21:47:54,681 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:54,681 INFO success: pcmanfm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:54,681 INFO success: lxpanel entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:54,681 INFO success: lxsession entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:54,683 INFO spawned: 'x11vnc' with pid 68 2018-01-25 21:47:54,683 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:56,638 INFO success: x11vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
The ONAP system/pods enter into the CrashLoopBackOff state, only when you delete the dockerdata-nfs for the respective ONAP component.
rm rf /dockerdatanfs/portal has been deleted. Now, ONAP system has noways of knowing - which data to delete, so there are uncleaned/dangling links.
Solution :
If you have kept the backup of dockerdata-nfs folder ( either complete folder or for portal ) , then put it back. Onap pods will take the data for portal from dockerdata-nfs and then delete the onap-portal pod. Then create the onap-portal pod again.
For the vnc-portal, I have faced the similar issue today:
run the command : kubectl describe po/<container-for-vnc-portal> n onapportal
Look for the docker image, it has dependency on - I think it's mariadb or some other docker image.
run the command : docker images | grep <image found in step 2> Note : the respective image will be missing.
Pull the respective docker image ( as found in step 2) docker pull <image name> Kubernetes will pick the newly pulled docker image. The issue for vnc-portal will be resolved.
Guys, it helps if you post your versions (onap branch, helm version, kubernetes version, rancher version, docker version), whether your config container ran ok 0/1 completed and that you have all dependent containers up (for example vnc-portal needs vid to start)
common issue is helm related (helm 2.5+ running on amsterdam - stick to 2.3 on that branch)
When you say helm 2.5+ are you referring to server version or client ? I only installed helm client v2.1.3 and I think rancher installs the helm server.
onap I am using is amsterdam
All the pods are up and running except for vnc-portal container in onap-portal namespace and elasticsearch container in onap-log
I followed the instructions specified in the below post by kranthi to solve the problem.
NOTE: Main reason for this issue is I did not have the recommended versions of helm/rancher & kubernetes. It was not so easy to align the versions so tried the below suggested fix and it worked for me. You can also try it and see if it solves your issue.
I had the same problem with Amsterdam branch. Master branch has fixes to resolve this. Basically the helm chart they defined lifecycle PostStart which may run before starting container itself (Its not guaranteed). So, please take the portal folder from master branch and replace in Amsterdam or just replace resources folder in side portal (from master) and also portal-vnc-dep.yaml file inside template from master to Amsterdam
Guys, follow or use as a reference the scripts below - it will create a rancher environment and install onap on either amsterdam or master (use your own onap-parameters.yaml)
Sorry for the trouble. I am a beginner to ONAP. I wanted to install ONAP on AWS environment. But as I went through your video I found I need onap_paramaters.yaml file which includes the Openstack credentials. Do I need this for installing ONAP on AWS environment. I want to install Onap on AWS instance only.
Is it optional or I must have Openstack Credentials
Hi, no, you can put fake user/pass/token strings there for now. When you get the point of running the use cases - like the vFW and need to create a customer/tenent/region on AAI - this is where real credentials will be required to authenticate to Keystone. Later when you orchestrate VNFs via SO - full functionality will be required.
For now use the sample one in the repo.
let us know how things work out. And don't hesitate to ask questions about AWS in your case when bringing up the system.
Before I start installing Onap, can you please help me understand the need of domain name for installation.
Can't I use Elastic IP only?
And about the use case, Can you let me know which use cases will work under this Installation of ONAP on Kubernetes with out having Openstack Credentials.
root@ip-10-0-1-113:~# ./cd.sh -b release-1.1.0 Wed Jan 31 06:53:31 UTC 2018 provide onap-parameters.yaml and aai-cloud-region-put.json vm.max_map_count = 262144 remove existing oom ./cd.sh: line 20: oom/kubernetes/oneclick/setenv.bash: No such file or directory ./cd.sh: line 22: oom/kubernetes/oneclick/deleteAll.bash: No such file or directory Error: incompatible versions client[v2.8.0] server[v2.6.1] sleeping 1 min deleting /dockerdata-nfs chmod: cannot access '/dockerdata-nfs/onap': No such file or directory pull new oom Cloning into 'oom'... fatal: Remote branch release-1.1.0 not found in upstream origin start config pod ./cd.sh: line 43: oom/kubernetes/oneclick/setenv.bash: No such file or directory moving onap-parameters.yaml to oom/kubernetes/config cp: cannot create regular file 'oom/kubernetes/config': No such file or directory ./cd.sh: line 47: cd: oom/kubernetes/config: No such file or directory ./cd.sh: line 48: ./createConfig.sh: No such file or directory verify onap-config is 0/1 not 1/1 - as in completed - an error pod - means you are missing onap-parameters.yaml or values are not set in it. No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete....
fatal: Remote branch release-1.1.0 not found in upstream origin
release-1.1.0 was deleted a month ago - yes I had a comment in my cd.sh script as an example for master or that release - I will update the comment to print "amsterdam" - so there is no confusion
Check your cd script output.rtf - you are not running the correct helm version (likely you are running 2.3 - should be running 2.6+ - ideally 2.8.0)
For the vnf image pull - have not looked at this - verify the right tag is being pulled from nexus3 and close off the JIRA if you find it.
If you look at your logs - you will see you have the right # of non-running containers (2) but you will notice that some of your createAll calls are failing on the new template tpl code added last week (yes the author of that change should have notified the community of the pending change - I picked up the comm task later that day).
like the following
Error: parse error in "appc/templates/appc-conf-configmap.yaml": template: appc/templates/appc-conf-configmap.yaml:8: function "tpl" not defined
The command helm returned with error code 1
Check this page for the right version - it changed on Wed.
I've attached the helm files I made for this workaround if you just expand them into ..../oom/kubernates you should get a directory called ves and then you can just go ../oneclick/createall.sh -n onap -a ves
Hi Andrew Fenner, it's nice to see that it works for you. I have OOM setup with out DCAE. Now I can download the ves-oom.tar and create the pod? How can I make other components point to this standalone DCAE model? we have to change vFWCL.zip to give DCAE collector ip and port right? Can you give more details on Closed Loop end?
The file is attached in the last post. The VES and CDAP are intergrated into the rest of the other components by the k8s dns. The way to expose the VES port is using
When we were doing SDNC preload operation, for SINK and PG, we noticed for the modified json files for SINK ( our values of VNF details and service instance etc), the existing/predefined VFWCL instance got changed? Was it correct?
Image pull errors usually mean you cannot reach nexus3.onap.org - especially that many - which could be your proxy (switch to a cell connection to verify).
Do a manual docker pull to check this.
Another reason could be you did not source setenv.bach where the docker repo credentials/url are set
Remember this is Kubernetes not Docker. Kubernetes is a layer on top of Docker - you don't need to run any docker commands except when installing the Rancher wrapper on Kubernetes - after that always use kubectl
Follow the instructions on this wiki "exactly" or use the scripts for your first time install
Pulling docker images yourself is not required - the only reason for the prepull is to speed up the onap startup - for example running the createAll a second time will run faster since the images were pulled earlier.
The images that the values.yaml (s) files pull are the ones pulled automatically by Kubernetes - you don't need later versions unless there are app fixes we have not switched to yet.
If you are having issues with docker pulls then it is in your system behind your firewall - I can't remember if it was you (I answer a lot of support questions here) - did you do a proper source of setenv.sh and also make sure your config pod is OK.
If you really want to see ONAP work usually OK - just to verify your procedure - run it on a VM in public cloud like AWS or Azure and apply that to your local environment. I am thinking that there may be an issue pulling from nexus3 - I have seen this in other corp environments.
I follow instruction above to run ONAP on Kubernetes, where the server and client are co-located.
I have two issues regarding the implementation:
When i checking pods by kubectl get pods --all-namespaces -a | grep 2/2 comment, i receiving following information, which the portal and policy are not listed.
2. in the next step, i just followed the VNC-portal through the Video but the pod portal is not available there too. In principle, i tried to add the portal but an error is comes up that "the portal is already exist". in addition i looking for the ete-k8s.sh file in the dockerdata-nfs but there is no any files except eteshare and robot!
For 1. Yes,policy and portal should come in the above 'kubectl' result. I would recommend checking your setenv.bash under $HOME/oom/kubernetes/oneclick and check which HELM_APPS you are deploying. Make sure it has policy and portal in there.
For 2. ete-k8s.sh is present under $HOME/oom/kubernetes/robot, not under dockerdata-nfs. eteshare under dockerdata-nfs/onap/robot would contain the logs of the run when you execute ete-k8s.sh.
Regarding to first issue: Policy and Portal are there.
Regarding to the second issue: i just followed instruction of the VNC-portal. The video shows that ete-k8s.sh must appear in the dockerdata-nfs when running ./createAll.bash -n demo
because of the portal, i can not check AAI endpoints and run health check!
I think mistakenly i have created to instances. One based on instruction provided in ONAP on Kubernetes (onap) and the second one based on vnc-portal instruction (demo). Should i delete one of the instances, for example demo? if yes please tell me what command i should use!
if i delete one instance, Does it effect on the other one?
when i ran kubectl get pods -n onap-portal for onap i receive following messages:
root@omap:~/oom/kubernetes/robot# kubectl get pods -n onap-portal NAME READY STATUS RESTARTS AGE portalapps-dd4f99c9b-lbm7w 0/2 Init:Error 0 24m portaldb-7f8547d599-f2wlv 0/1 CrashLoopBackOff 5 24m portalwidgets-6f884fd4b4-wl84p 0/1 Init:Error 0 24m vnc-portal-687cdf7845-clqth 0/1 Init:0/4 1 24m
But for demo is:
root@omap:~/oom/kubernetes/robot# kubectl get pods -n demo-portal No resources found.
in other case, when i run the health check (as you mentioned), i receive the following message:
root@omap:~/oom/kubernetes/robot# ./ete-k8s.sh health No resources found. error: expected 'exec POD_NAME COMMAND [ARG1] [ARG2] ... [ARGN]'. POD_NAME and COMMAND are required arguments for the exec command See 'kubectl exec -h' for help and examples.
I am not sure about the demo-portal. But yes, if the ports are already being used, there would be conflicts when launching similar pod again.
I would recommend clearing up and starting afresh.
Here is what I would do:
Delete the onap containers. Basically follow the steps here.
Before you restart again, execute: kubectl get pods --all-namespaces -a to make sure that none of onap containers are running. Also check if there are any 'demo' portal pods are running. You should only see Kubernetes specific pods.
Once clean, run createConfig and then createAll for onap deployment.
********** Cleaning up ONAP: release "demo-consul" deleted namespace "demo-consul" deleted clusterrolebinding "demo-consul-admin-binding" deleted Service account demo-consul-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-msb" deleted clusterrolebinding "demo-msb-admin-binding" deleted Service account demo-msb-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-mso" deleted clusterrolebinding "demo-mso-admin-binding" deleted Service account demo-mso-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-message-router" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-message-router-admin-binding" not found Service account demo-message-router-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-sdnc" deleted clusterrolebinding "demo-sdnc-admin-binding" deleted Service account demo-sdnc-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-vid" deleted clusterrolebinding "demo-vid-admin-binding" deleted Service account demo-vid-admin-binding deleted.
E0201 09:24:42.090532 5895 portforward.go:331] an error occurred forwarding 32898 -> 44134: error forwarding port 44134 to pod 9b031662eac045462b5e018cc6829467a799568021c3a97dfe8d7ec6272e1064, uid : exit status 1: 2018/02/01 09:24:42 socat[7805] E connect(6, AF=2 127.0.0.1:44134, 16): Connection refused Error: transport is closing namespace "demo-portal" deleted clusterrolebinding "demo-portal-admin-binding" deleted Service account demo-portal-admin-binding deleted.
Error: release: "demo-policy" not found namespace "demo-policy" deleted clusterrolebinding "demo-policy-admin-binding" deleted Service account demo-policy-admin-binding deleted.
Error: release: "demo-appc" not found Error from server (NotFound): namespaces "demo-appc" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-appc-admin-binding" not found Service account demo-appc-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-sdc" deleted clusterrolebinding "demo-sdc-admin-binding" deleted Service account demo-sdc-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-dcaegen2" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-dcaegen2-admin-binding" not found Service account demo-dcaegen2-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-log" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-log-admin-binding" not found Service account demo-log-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-cli" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-cli-admin-binding" not found Service account demo-cli-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-multicloud" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-multicloud-admin-binding" not found Service account demo-multicloud-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-clamp" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-clamp-admin-binding" not found Service account demo-clamp-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-vnfsdk" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-vnfsdk-admin-binding" not found Service account demo-vnfsdk-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-uui" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-uui-admin-binding" not found Service account demo-uui-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-aaf" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-aaf-admin-binding" not found Service account demo-aaf-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-vfc" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-vfc-admin-binding" not found Service account demo-vfc-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-kube2msb" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-kube2msb-admin-binding" not found Service account demo-kube2msb-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-esr" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-esr-admin-binding" not found Service account demo-esr-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo" deleted Waiting for namespaces termination...
Apart of that i try to delete the demo and onap but i am not succeed again.
Here is the error for the second command (./deleteAll.bash -n onap):
root@omap:~/oom/kubernetes/oneclick# ./deleteAll.bash -n demo Current kubectl context does not match context specified: ONAP You are about to delete deployment from: ONAP To continue enter context name: demo Your response does not match current context! Skipping delete ... root@omap:~/oom/kubernetes/oneclick#
Some of the earlier errors are normal - I have seen these on half-deployed systems
if the following shows pods still up (except the 6 for kubernetes) even after a helm delete --purge - then you could also start from scratch - delete all of your kubernetes and rancher docker containers
also try to follow the tutorial here "exactly" if this is your first time running onap - or use the included scripts - you won't have any issues that way.
Also just to be safe - because there may be some hardcoding of "onap" - it was hardcoded in places under helm 2.3 because we could not use the tpl template until 2.6 (we only upgraded to 2.8 last week)
I am totally new on ONAP. I exactly followed as the tutorial, but once i tried to add vnc-portal, the errors are come up. Because in instruction of the vnc-portal mentioned that need to create a demo for the portal which make a conflict with the onap (it seems that running two instances are complicated!)
As you suggested i deleted the Pods, but one of them still is in terminating state, should i ignore that or i should start from scratch?
root@omap:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a NAMESPACE NAME READY STATUS RESTARTS AGE demo-sdnc sdnc-dbhost-0 0/2 Terminating 1 2d kube-system heapster-76b8cd7b5-z99xr 1/1 Running 0 3d kube-system kube-dns-5d7b4487c9-zc5tx 3/3 Running 735 3d kube-system kubernetes-dashboard-f9577fffd-c8bgs 1/1 Running 0 3d kube-system monitoring-grafana-997796fcf-mgqd9 1/1 Running 0 3d kube-system monitoring-influxdb-56fdcd96b-pnbrj 1/1 Running 0 3d kube-system tiller-deploy-74f6f6c747-7cvth 1/1 Running 373 3d
Eveything is normal except for the failed SDNC container deletion - I have seen this on another system - 2 days ago - something went into master for SDNC that caused this - for that particular machine deleted the VM and raised a new spot VM - a helm delete --purge had no effect - even killing the docker outside of kubernetes had no effect - I had notes on this and will raise a JIRA - the next system I raised for the CD jobs dis not have the issue anymore.
562 Comments
kranthi guttikonda
Hi Michael O'Brien Does this include DCAE as well? I think this is the best way to install ONAP. Does this include any config files as well to talk to openstack cloud to instantiate VNFs?
Michael O'Brien
Sorry,
DCAE is not currently in the repo yet - that will require consolidation of the DCAE Controller (a lot of work)
../oneclick/dcae.sh is listed as "under construction"
As far as I know VNFs like the vFirewall come up, however closed loop operations will need DCAE.
/michael
Gülsüm Atıcı
Hi,
I am planning to install ONAP but couldn't decide to use which way of the setup. Using Full ONAP setup on VMs or Kubernetes based setup with containers. Are both solutions will be developed in the future or development will continue with one of them ?
Do you have any advise about it ?
Kumar Lakshman Kumar
Hi Gatici,
you can use the Kubernetes one. In Beijing even DCAE is containerized. you can use OOM to install the Full ONAP on kubernetes cluster.
Gülsüm Atıcı
Thanks Kumar.
kranthi guttikonda
Thanks Michael O'Brien
Jason Hunt
I see the recently added update about not being able to pull images because of missing credentials. I encountered this yesterday and was able to get a workaround done by creating the secret and embedding the imagePullSecrets to the *-deployment.yaml file.
Here's steps just for the robot:
then added to the robot-deployment.yaml (above volumes):
This has to be done for each namspace and each script with the image would need to be updated. An alternate that I'm looking at is:
- modify the default service account for the namespace to use this secret as an imagePullSecret.
- kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "myregistrykey"}]}'
- Now, any new pods created in the current namespace will have this added to their spec:
(from https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ )
This would probably have to be done in the createAll.bash script, possibly with the userid/password as parameters to that script.
Is there a suggested approach? if so, I can submit some updates.
Michael O'Brien
Talk about parallel development - google served me
https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#create-a-secret-that-holds-your-authorization-token
kubectl create secret docker-registry regsecret --docker-server=nexus3.onap.org:10001 --docker-username=docker --docker-password=docker --docker-email=email@email.com
testing this now
/michael
Michael O'Brien
Jason,
In our current environment (namespace 1:1 → service 1:1 → pod 1:1 → docker container) it looks like the following single command will have a global scope (no need to modify individual yaml files - a slight alternative to what you have suggested which would work as well.
kubectl create secret docker-registry regsecret --docker-server=nexus3.onap.org:10001 --docker-username=docker --docker-password=docker --docker-email=email@email.com
So no code changes which is good. Currently everything seems to be coming up - but my 70G VM is at 99% so we need more HD space.
Edit: actually even though it looked to work
2017-06-30T19:31 UTC
2017-06-30T19:31 UTC
pulling image "nexus3.onap.org:10001/openecomp/sdc-elasticsearch:1.0-STAGING-latest"
kubelet 172.17.4.99
spec.containers{sdc-es}
2
2017-06-30T19:31 UTC
2017-06-30T19:31 UTC
still getting errors without the namespace for each service like in your example - if we wait long enough
So a better fix Yves and I are testing is to put the line just after the namespace creation in createAll.bash
create_namespace() {
kubectl create namespace $1-$2
kubectl --namespace $1-$2 create secret docker-registry regsecret --docker-server=nexus3.onap.org:10001 --docker-username=docker --docker-password=docker --docker-email=email@email.com
}
/michael
Jason Hunt
Michael,
I'm surprised that it appears to work for you, as it doesn't for my environment. First, you should have to specify the imagePullSecrets for it to work... that can either be done in the yaml or by using the patch serviceaccount command. Second, the scope of the secret for imagePullSecrets is just that namespace:
source: https://kubernetes.io/docs/concepts/containers/images/#creating-a-secret-with-a-docker-config
In your environment, had you previously pulled the images before? I noticed in my environment that it would find a previously pulled image even if I didn't have the authentication credentials. To test that out, I had to add " imagePullPolicy: Always " to the *-deployment.yaml file under the container scope, so it would always try to pull it.
So I think a fix is necessary. I can submit a suggested change to the createAll.bash script that creates the secret and updates the service account in each namespace?
Jason Hunt
I think you'll need to add to the service account, too, so....
I will test now.
Michael O'Brien
We previously saw a successful pull from nexus3 - but that turned out to be a leftover mod in my branch yaml for a specific pod.
Yes, I should know in about 10 min (in the middle of a redeploy) if I need to patch - makes sense because it would assume a magical 1:1 association - what if I created several secrets.
I'll adjust and retest.
btw, thanks for working with us getting Kubernetes/oom up!
/michael
Jason Hunt
My test of the updated create_namespace() method eliminated all of the "no credentials" errors. I have plenty of other errors (most seem to be related to the readiness check timing out), but I think this one is licked.
Is there a better way to track this than the comments here? Jira?
Michael O'Brien
JIRA is
OOM-3 - Getting issue details... STATUS
Michael O'Brien
Looks like we will need to specify the secret on each yaml file - because of our mixed nexus3/dockerhub repos
When we try to pull from dockerhub - the secret gets applied
Failed to pull image "oomk8s/readiness-check:1.0.0": unexpected EOF
Error syncing pod, skipping: failed to "StartContainer" for "mso-readiness" with ErrImagePull: "unexpected EOF"
MountVolume.SetUp failed for volume "kubernetes.io/secret/3a7b5084-5dd2-11e7-b73a-08002723e514-default-token-fs361" (spec.Name: "default-token-fs361") pod "3a7b5084-5dd2-11e7-b73a-08002723e514" (UID: "3a7b5084-5dd2-11e7-b73a-08002723e514") with: Get http://127.0.0.1:8080/api/v1/namespaces/onap3-mso/secrets/default-token-fs361: dial tcp 127.0.0.1:8080: getsockopt: connection refused
retesting
Michael O'Brien
Actually our mso images loaded fine after internal retries - bringing up the whole system (except dcae) - so this is without a secret override on the yamls that target nexus3.
It includes your patch line from above
My vagrant vm ran out of HD space at 19G - resizing
v.customize ["modifyhd", "aa296a7e-ae13-4212-a756-5bf2a8461b48", "--resize", "32768"]
wont work on the coreos image - moving up one level of virtualization (docker on virtualbox on vmware-rhel73 in win10) to (docker on virtualbox on win10)
vid still failing on FS
/michael
Vaibhav Chopra
I am getting the "Error syncing Pod" errors in bringing currently only aai and vid pod up.
I implemented even both the fix mentioned in
OOM-3 -
1)
create_namespace() {
kubectl create namespace $1-$2
kubectl --namespace $1-$2 create secret docker-registry regsecret --docker-server=nexus3.onap.org:10001 --docker-username=docker --docker-password=docker --docker-email=email@email.com
kubectl --namespace $1-$2 patch serviceaccount default -p '{"imagePullSecrets": [{"name": "regsecret"}]}'
}
2) Adding below in vid-server-deployment.yaml
Errors:-
aai-service-403142545-f620t
onap-aai
Waiting: PodInitializing
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-aai.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal
Error syncing pod
vid-mariadb-1108617343-zgnbd
onap-vid
Waiting: rpc error: code = 2 desc = failed to start container "c4966c8f8dbfdf460ca661afa94adc7f536fd4b33ed3af7a0857ecdeefed1225": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/vid/vid/lf_config/vid-my.cnf\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36/etc/mysql/my.cnf\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-vid.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal
Error: failed to start container "vid-mariadb": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/vid/vid/lf_config/vid-my.cnf\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36/etc/mysql/my.cnf\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
Error syncing pod
Is there anything I am missing here?
Michael O'Brien
Vaibhav,
Hi, OOM-3 has been deprecated (it is in the closed state) - the secrets fix is implemented differently now - you don't need the workaround.
Also the search line limits is a bug in rancher that you can ignore - it is warning that more than 5 dns search terms were used - not an issue - see my other comments on this page
https://github.com/rancher/rancher/issues/9303
The only real issue is "Error syncing pod" this is an intermittent timing issue (most likely) that we are working on - a faster/more-cores system should see less of this.
If you only have 2 working pods - you might not have run the config-init pod - verify you have /dockerdata-nfs on you host FS.
for vid you should see (20170831 1.1 build)
onap-vid vid-mariadb-2932072366-gw6b7 1/1 Running 0 1h
onap-vid vid-server-377438368-bt6zg 1/1 Running 0 1h
/michael
Vaibhav Chopra
Hi Michael,
I have ran the config-init, but at that time I was installing one by one only, Now I tried to install in one go and got success for below:-
kube-system heapster-4285517626-q0996 1/1 Running 5 19h 10.42.41.231 storm0220.cloud.com
kube-system kube-dns-2514474280-kvcvx 3/3 Running 12 19h 10.42.4.230 storm0220.cloud.com
kube-system kubernetes-dashboard-716739405-fjxpm 1/1 Running 7 19h 10.42.35.168 storm0220.cloud.com
kube-system monitoring-grafana-3552275057-0v7mk 1/1 Running 6 19h 10.42.128.254 storm0220.cloud.com
kube-system monitoring-influxdb-4110454889-vxv19 1/1 Running 6 19h 10.42.159.54 storm0220.cloud.com
kube-system tiller-deploy-737598192-t56wv 1/1 Running 2 19h 10.42.61.18 storm0220.cloud.com
onap-aai hbase-2720973979-p2btt 0/1 Running 0 17h 10.42.12.51 storm0220.cloud.com
onap-appc appc-dbhost-3721796594-v9k2k 1/1 Running 0 17h 10.42.215.107 storm0220.cloud.com
onap-message-router zookeeper-4131483451-r5msz 1/1 Running 0 17h 10.42.76.76 storm0220.cloud.com
onap-mso mariadb-786536066-dx5px 1/1 Running 0 17h 10.42.88.165 storm0220.cloud.com
onap-policy mariadb-1621559354-nbrvh 1/1 Running 0 17h 10.42.108.42 storm0220.cloud.com
onap-portal portaldb-3934803085-fj217 1/1 Running 0 17h 10.42.145.204 storm0220.cloud.com
onap-robot robot-1597903591-fffz3 1/1 Running 0 1h 10.42.253.121 storm0220.cloud.com
onap-sdnc sdnc-dbhost-3459361889-7xdmw 1/1 Running 0 17h 10.42.58.17 storm0220.cloud.com
onap-vid vid-mariadb-1108617343-gsv8f 1/1 Running 0 17h 10.42.175.190 storm0220.cloud.com
but yes, again Many of them are stuck with the same error :- "Error Syncing POD"
and yes now the Server I am using is having 128GB Ram. (Though I have configured proxy in best known manner, but do you think this also can relates to proxy then I will dig more in that direction)
BR/
VC
Michael O'Brien
I'll contact you directly about proxy access.
Personally I try to run on machines/VMs outside the corporate proxy - to avoid the proxy part of the triage equation
/michael
Vaibhav Chopra
Sure, Thanks Frank,
Will check the Proxy, Anyways other than proxy, whenever you get to know a fix against "Error Syncing POD" , Please update us.
Currently,I have 20 out of 34 onap PODs are running fine and rest all are failing with "Error syncing POD"
BR/
VC
Michael O'Brien
Update: containers are loading now - for example both pods for VID come up ok if we first run the config-init pod to bring up the config mounts. Also there is an issue with unresolved DNS entries that is fixed temporarily by adding to /etc/resolv.conf
1) mount config files
root@obriensystemsucont0:~/onap/oom/kubernetes/config# kubectl create -f pod-config-init.yaml
pod "config-init" created
2) fix DNS search
https://github.com/rancher/rancher/issues/9303
Fix DNS resolution before running any more pods ( add service.ns.svc.cluster.local)
root@obriensystemskub0:~/oom/kubernetes/oneclick# cat /etc/resolv.conf
nameserver 192.168.241.2
search localdomain service.ns.svc.cluster.local
3) run or restart VID service as an example (one of 10 failing pods)
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces
onap-vid vid-mariadb-1357170716-k36tm 1/1 Running 0 10m
onap-vid vid-server-248645937-8tt6p 1/1 Running 0 10m
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl --namespace onap-vid logs -f vid-server-248645937-8tt6p
16-Jul-2017 02:46:48.707 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 22520 ms
tomcat comes up on 127.0.0.1:30200 for this colocated setup
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl get services --all-namespaces -o wide
onap-vid vid-mariadb None <none> 3306/TCP 1h app=vid-mariadb
onap-vid vid-server 10.43.14.244 <nodes> 8080:30200/TCP 1h app=vid-server
Michael O'Brien
Good news – 32 of 33 pods are up (sdnc-portal is going through a restart).
Ran 2 parallel Rancher systems on 48G Ubuntu 16.04.2 VM’s on two 64G servers
Stats: Without DCAE (which is up to 40% of ONAP) we run at 33G – so I would expect a full system to be around 50G which means we can run on a P70 Thinkpad laptop with 64G.
Had to add some dns-search domains for k8s in interfaces to appear in resolv.conf after running the config pod.
Issues:
after these 2 config changes the pods come up within 25 min except policy-drools which takes 45 min (on 1 machine but not the other) and sdnc-portal (which is having issues with some node downloads)
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl --namespace onap-sdnc logs -f sdnc-portal-3375812606-01s1d | grep ERR
npm ERR! fetch failed https://registry.npmjs.org/is-utf8/-/is-utf8-0.2.1.tgz
I’ll look at instantiating the vFirewall VM’s and integrating DCAE next.
on 5820k 4.1GHz 12 vCores 48g Ubuntu 16.04.2 VM on 64g host
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system heapster-859001963-bmlff 1/1 Running 5 43m 10.42.143.118 obriensystemskub0
kube-system kube-dns-1759312207-0x1xx 3/3 Running 8 43m 10.42.246.144 obriensystemskub0
kube-system kubernetes-dashboard-2463885659-jl5jf 1/1 Running 5 43m 10.42.117.156 obriensystemskub0
kube-system monitoring-grafana-1177217109-7gkl6 1/1 Running 4 43m 10.42.79.40 obriensystemskub0
kube-system monitoring-influxdb-1954867534-8nr2q 1/1 Running 5 43m 10.42.146.215 obriensystemskub0
kube-system tiller-deploy-1933461550-w77c5 1/1 Running 4 43m 10.42.1.66 obriensystemskub0
onap-aai aai-service-301900780-wp3w1 1/1 Running 0 25m 10.42.104.101 obriensystemskub0
onap-aai hbase-2985919495-zfs2c 1/1 Running 0 25m 10.42.208.135 obriensystemskub0
onap-aai model-loader-service-2352751609-4qb0x 1/1 Running 0 25m 10.42.25.139 obriensystemskub0
onap-appc appc-4266112350-gscxh 1/1 Running 0 25m 10.42.90.128 obriensystemskub0
onap-appc appc-dbhost-981835105-lp6tn 1/1 Running 0 25m 10.42.201.58 obriensystemskub0
onap-appc appc-dgbuilder-939982213-41znl 1/1 Running 0 25m 10.42.30.127 obriensystemskub0
onap-message-router dmaap-1381770224-c5xp8 1/1 Running 0 25m 10.42.133.232 obriensystemskub0
onap-message-router global-kafka-3488253347-zt8x9 1/1 Running 0 25m 10.42.235.227 obriensystemskub0
onap-message-router zookeeper-3757672320-bxkvs 1/1 Running 0 25m 10.42.14.4 obriensystemskub0
onap-mso mariadb-2610811658-r22z9 1/1 Running 0 25m 10.42.46.110 obriensystemskub0
onap-mso mso-2217182437-1r8fm 1/1 Running 0 25m 10.42.120.204 obriensystemskub0
onap-policy brmsgw-554754608-gssf8 1/1 Running 0 25m 10.42.84.128 obriensystemskub0
onap-policy drools-1184532483-kg8sr 1/1 Running 0 25m 10.42.62.198 obriensystemskub0
onap-policy mariadb-546348828-1ck21 1/1 Running 0 25m 10.42.118.120 obriensystemskub0
onap-policy nexus-2933631225-s1qjz 1/1 Running 0 25m 10.42.73.217 obriensystemskub0
onap-policy pap-235069217-qdf2r 1/1 Running 0 25m 10.42.157.211 obriensystemskub0
onap-policy pdp-819476266-zvncc 1/1 Running 0 25m 10.42.38.47 obriensystemskub0
onap-policy pypdp-3646772508-n801j 1/1 Running 0 25m 10.42.244.206 obriensystemskub0
onap-portal portalapps-157357486-gjnnc 1/1 Running 0 25m 10.42.83.144 obriensystemskub0
onap-portal portaldb-351714684-1n956 1/1 Running 0 25m 10.42.8.80 obriensystemskub0
onap-portal vnc-portal-1027553126-h6dhd 1/1 Running 0 25m 10.42.129.60 obriensystemskub0
onap-robot robot-44708506-t10kk 1/1 Running 0 31m 10.42.185.118 obriensystemskub0
onap-sdc sdc-be-4018435632-3k6k2 1/1 Running 0 25m 10.42.246.193 obriensystemskub0
onap-sdc sdc-cs-2973656688-kktn8 1/1 Running 0 25m 10.42.240.176 obriensystemskub0
onap-sdc sdc-es-2628312921-bg0dg 1/1 Running 0 25m 10.42.67.214 obriensystemskub0
onap-sdc sdc-fe-4051669116-3b9bh 1/1 Running 0 25m 10.42.42.203 obriensystemskub0
onap-sdc sdc-kb-4011398457-fgpkl 1/1 Running 0 25m 10.42.47.218 obriensystemskub0
onap-sdnc sdnc-1672832555-1h4s7 1/1 Running 0 25m 10.42.120.148 obriensystemskub0
onap-sdnc sdnc-dbhost-2119410126-48mt9 1/1 Running 0 25m 10.42.133.166 obriensystemskub0
onap-sdnc sdnc-dgbuilder-730191098-gj6g9 1/1 Running 0 25m 10.42.154.99 obriensystemskub0
onap-sdnc sdnc-portal-3375812606-01s1d 0/1 Running 0 25m 10.42.105.164 obriensystemskub0
onap-vid vid-mariadb-1357170716-vnmhr 1/1 Running 0 28m 10.42.218.225 obriensystemskub0
onap-vid vid-server-248645937-m67r9 1/1 Running 0 28m 10.42.227.81 obriensystemskub0
nagaraja sr
Michael O'Brien - (deprecated as of 20170508) - use obrienlabs i've got to the point where i can access the portal login page, but after inputting the credentials, it keeps redirecting to port 8989 and fails instead of the external mapped port (30215 in my case) any thoughts ?
i'm running on GCE with 40GB and only running sdc, message-router and portal for now.
Michael O'Brien
Nagaraja, yes good question. I actually have been able to get the point of running portal - as the 1.0.0 system is pretty stable now
onap-portal portalapps 255.255.255.255 <nodes> 8006:30213/TCP,8010:30214/TCP,8989:30215/TCP 2h
I was recording a demo and ran into the same issue - I will raise a JIRA as we fix this and post here
http://portal.api.simpledemo.openecomp.org:30215/ECOMPPORTAL/login.htm
redirects to
Request URL:
http://portal.api.simpledemo.openecomp.org:8989/ECOMPPORTAL/applicationsHome
because of hardcoded parameters like the following in the DockerFile
Eddy Hautot
Hello, was it a workaround to this finally?
I ran the OOM installation from scratch and managed to logged to Portal by changing back the port to 30215 after the redirection of the login.
Also when i logged in with cs0008 user and click on SDC, i have: "can’t establish a connection to the server at sdc.api.simpledemo.onap.org:8181" (should be changed to port 30206?)
Do you know which config has to be changed for this?
Thank you
Mike Elliott
Are you accessing the ECOMP Portal via the 'onap-portal vnc-portal-1027553126-h6dhd' container?
This container was added to the standard ONAP deployment so one may VNC into the ONAP Deployment instance (namespace) and have networking resolved fully resolved within K8s.
Michael O'Brien
Mike, Was just writing a question to you - yes looks like I am using the wrong container - reworking now
thank you
Michael O'Brien
Nagaraga,
Portal access via the vnc-portal container (port 30211) is documented above now in
RunningONAPusingthevnc-portal
/michael
Vaibhav Chopra
Hi all,
I am new to this kubernetes installation of ONAP and installing ONAP component 1 by 1 on My VM (due to memory constraint)
I want to see if the PODs are working fine
I launched robot component:-
onap-robot robot-1597903591-1tx35 1/1 Running 0 2h 10.42.104.187 localhost
and logged in to same via
kubectl -n onap-robot exec -it robot-1597903591-1tx35 /bin/bash
Now do I need to mount some directory to see the containers and How docker process will run in same.
BR/
VC
Vaibhav Chopra
Docker process are not running by own may be due to proxy internet being used. Trying running manually the install and setup by logging to each component.
Michael O'Brien
Vaibhav,
Hi, there are a combination of files - some are in the container itself - see /var/opt
some are off the shared file system on the host - see /dockerdata-nfs
In the case of robot - you have spun up one pod - each pod has a single docker container, to see the other pods/containers - kubectl into each like you have into robot - just change the pod name. kubectl is an abstraction on top of docker - so you don't need to directly access docker containers.
/michael
Geora Barsky
Vaibhav, if you are trying to see the status of the pod or look at the log file, you can do it also through Rancher / Kubernetes dashboard :
Vaibhav Chopra
Hi Michael,
Yes, I can see the mounted directories and found robot_install.sh in /var/opt/OpenECOMP_ETE/demo/boot
On K8s Dashboard and CLI, the POD is in running state but when I logged in (via kubectl) any of them, I am unable to see any docker process running via docker ps. (Even docker itself is not installed)
I think this Ideally is taken care by POD itself right or do we need to go inside each component and run the installation script of that specific.
BR/
VC
Michael O'Brien
Vaibhav, Hi, the architecture of kubernetes is such that it manages docker containers - we are not running docker on docker. Docker ps will only be possible on the host machine(s)/vm(s) that kubernetes is running on - you will see the wrapper docker containers running the kubernetes and rancher undercloud.
When you "kubectl exec -it" - into a pod you have entered a docker container the same as a "docker exec -it" at that point you are in a container process, try doing a "ps -ef | grep java" to see if a java process is running for example. Note that by the nature of docker most containers will have a minimal linux install - so some do not include the ps command for example.
If you check the instructions above you will see the first step is to install docker 1.12 only on the host - as you end up with 1 or more hosts running a set of docker containers after ./createAll.bash finishes
example - try the mso jboss container - it is one of the heavyweight containers
root@ip-172-31-93-122:~# kubectl -n onap-mso exec -it mso-371905462-w0mcj bash
root@mso-371905462-w0mcj:/# ps -ef | grep java
root 1920 1844 0 Aug27 ? 00:28:33 java -D[Standalone] -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Xms64m -Xmx4g -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=1g -Djboss.bind.address=0.0.0.0 -Djboss.bind.address.management=0.0.0.0 -Dmso.db=MARIADB -Dmso.config.path=/etc/mso/config.d/ -Dorg.jboss.boot.log.file=/opt/jboss/standalone/log/server.log -Dlogging.configuration=file:/opt/jboss/standalone/configuration/logging.properties -jar /opt/jboss/jboss-modules.jar -mp /opt/jboss/modules org.jboss.as.standalone -Djboss.home.dir=/opt/jboss -Djboss.server.base.dir=/opt/jboss/standalone -c standalone-full-ha-mso.xml
if you want to see the k8s wrapped containers - do a docker ps on the host
root@ip-172-31-93-122:~# docker ps | grep mso
9fed2b7ebd1d nexus3.onap.org:10001/openecomp/mso@sha256:ab3a447956577a0f339751fb63cc2659e58b9f5290852a90f09f7ed426835abe "/docker-files/script" 4 days ago Up 4 days k8s_mso_mso-371905462-w0mcj_onap-mso_11da22bf-8b3d-11e7-9e1a-0289899d0a5f_0
e4171a2b73d8 nexus3.onap.org:10001/mariadb@sha256:3821f92155bf4311a59b7ec6219b79cbf9a42c75805000a7c8fe5d9f3ad28276 "/docker-entrypoint.s" 4 days ago Up 4 days k8s_mariadb_mariadb-786536066-87g9d_onap-mso_11bc6958-8b3d-11e7-9e1a-0289899d0a5f_0
8ba86442fbde gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 days ago Up 4 days k8s_POD_mso-371905462-w0mcj_onap-mso_11da22bf-8b3d-11e7-9e1a-0289899d0a5f_0
f099c5613bf1 gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 days ago Up 4 days k8s_POD_mariadb-786536066-87g9d_onap-mso_11bc6958-8b3d-11e7-9e1a-0289899d0a5f_0
Cyril Nleng
Hi all,
I am new to kubernetes installation of ONAP and have problems cloning onap repository.
I have tried git clone -b release-1.0.0 http://gerrit.onap.org/r/oom
but ended up with the following error
fatal: unable to access 'http://gerrit.onap.org/r/oom/': The requested URL returned error: 403
I also tried to use ssh git clone -b release-1.0.0 ssh://cnleng@gerrit.onap.org:29418/oom

but I cannot access settings on https://gerrit.onap.org (Already have an account on Linux foundation) to copy my ssh keys
Any help will be appreciated.
Thanks
Michael O'Brien
403 in your case might be due to your proxy or firewall - check access away from your company if possible
Verified the URL
root@ip-172-31-90-90:~/test# git clone -b release-1.0.0 http://gerrit.onap.org/r/oom
Cloning into 'oom'...
remote: Counting objects: 896, done
remote: Finding sources: 100% (262/262)
remote: Total 1701 (delta 96), reused 1667 (delta 96)
Receiving objects: 100% (1701/1701), 1.08 MiB | 811.00 KiB/s, done.
Resolving deltas: 100% (588/588), done.
Checking connectivity... done.
If you login to gerrit and navigate to the oom directory, it will supply you with anon, https and ssl urls - try each of them they should work.
Geora Barsky
Hi, I am trying to install ONAP components though oom, but getting the following errors:
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-appc.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal
I tried to edit /etc/resolve.conf according to Michael's comment above:
nameserver <server ip>
search localdomain service.ns.svc.cluster.local
but it does not seem helps
Please advise how to resolve this DNS issue
Thanks
Geora
Michael O'Brien
Geora, hi, that is a red herring unfortunately - there is a bug in rancher where they add more than 5 domains to the search tree - you can ignore these - the resolve.conf turns out to have no effect - it is removed except in the comment history
https://github.com/rancher/rancher/issues/9303
/michael
Michael O'Brien
todo: update table/diagram on aai for 1.1 coming in
root@obriensystemskub0:~/11/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
default config-init 0/1 Completed 0 45d
kube-system heapster-859001963-kz210 1/1 Running 5 46d
kube-system kube-dns-1759312207-jd5tf 3/3 Running 8 46d
kube-system kubernetes-dashboard-2463885659-xv986 1/1 Running 4 46d
kube-system monitoring-grafana-1177217109-sm5nq 1/1 Running 4 46d
kube-system monitoring-influxdb-1954867534-vvb84 1/1 Running 4 46d
kube-system tiller-deploy-1933461550-gdxch 1/1 Running 4 46d
onap config-init 0/1 Completed 0 1h
onap-aai aai-dmaap-2612279050-g4qjj 1/1 Running 0 1h
onap-aai aai-kafka-3336540298-kshzc 1/1 Running 0 1h
onap-aai aai-resources-2582573456-n1v1q 0/1 CrashLoopBackOff 9 1h
onap-aai aai-service-3847504356-03rk2 0/1 Init:0/1 3 1h
onap-aai aai-traversal-1020522763-njrw7 0/1 Completed 10 1h
onap-aai aai-zookeeper-3839400401-160pk 1/1 Running 0 1h
onap-aai data-router-1134329636-f5g2j 1/1 Running 0 1h
onap-aai elasticsearch-2888468814-4pmgd 1/1 Running 0 1h
onap-aai gremlin-1948549042-j56p9 0/1 CrashLoopBackOff 7 1h
onap-aai hbase-1088118705-f29c1 1/1 Running 0 1h
onap-aai model-loader-service-784161734-3njbr 1/1 Running 0 1h
onap-aai search-data-service-237180539-0sj6c 1/1 Running 0 1h
onap-aai sparky-be-3826115676-c2wls 1/1 Running 0 1h
onap-appc appc-2493901092-041m9 1/1 Running 0 1h
onap-appc appc-dbhost-3869943665-5d0vb 1/1 Running 0 1h
onap-appc appc-dgbuilder-2279934547-t2qqx 0/1 Running 1 1h
onap-message-router dmaap-3009751734-w59nn 1/1 Running 0 1h
onap-message-router global-kafka-1350602254-f8vj6 1/1 Running 0 1h
onap-message-router zookeeper-2151387536-sw7bn 1/1 Running 0 1h
onap-mso mariadb-3820739445-qjrmn 1/1 Running 0 1h
onap-mso mso-278039889-4379l 1/1 Running 0 1h
onap-policy brmsgw-1958800448-p855b 1/1 Running 0 1h
onap-policy drools-3844182126-31hmg 0/1 Running 0 1h
onap-policy mariadb-2047126225-4hpdb 1/1 Running 0 1h
onap-policy nexus-851489966-h1l4b 1/1 Running 0 1h
onap-policy pap-2713970993-kgssq 1/1 Running 0 1h
onap-policy pdp-3122086202-dqfz6 1/1 Running 0 1h
onap-policy pypdp-1774542636-vp3tt 1/1 Running 0 1h
onap-portal portalapps-2603614056-4030t 1/1 Running 0 1h
onap-portal portaldb-122537869-8h4hd 1/1 Running 0 1h
onap-portal portalwidgets-3462939811-9rwtl 1/1 Running 0 1h
onap-portal vnc-portal-2396634521-7zlvf 0/1 Init:2/5 3 1h
onap-robot robot-2697244605-cbkzp 1/1 Running 0 1h
onap-sdc sdc-be-2266987346-r321s 0/1 Running 0 1h
onap-sdc sdc-cs-1003908407-46k1q 1/1 Running 0 1h
onap-sdc sdc-es-640345632-7ldhv 1/1 Running 0 1h
onap-sdc sdc-fe-783913977-ccg59 0/1 Init:0/1 3 1h
onap-sdc sdc-kb-1525226917-j2n48 1/1 Running 0 1h
onap-sdnc sdnc-2490795740-pfwdz 1/1 Running 0 1h
onap-sdnc sdnc-dbhost-2647239646-5spg0 1/1 Running 0 1h
onap-sdnc sdnc-dgbuilder-1138876857-1b40z 0/1 Running 0 1h
onap-sdnc sdnc-portal-3897220020-0tt9t 0/1 Running 1 1h
onap-vid vid-mariadb-2479414751-n33qf 1/1 Running 0 1h
onap-vid vid-server-1654857885-jd1jc 1/1 Running 0 1h
20170902 update - everything up (minus to-be-merged-dcae)
root@ip-172-31-93-160:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config-init 0/1 Completed 0 21m
onap-aai aai-service-3321436576-2snd6 0/1 PodInitializing 0 18m
onap-policy drools-3066421234-rbpr9 0/1 Init:0/1 1 18m
onap-portal vnc-portal-700404418-r61hm 0/1 Init:2/5 1 18m
onap-sdc sdc-fe-3467675014-v8jxm 0/1 Running 0 18m
root@ip-172-31-93-160:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces | grep 0/1
root@ip-172-31-93-160:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-7wdct 1/1 Running 0 1d
kube-system kube-dns-2514474280-kmd6v 3/3 Running 3 1d
kube-system kubernetes-dashboard-716739405-xxn5k 1/1 Running 0 1d
kube-system monitoring-grafana-3552275057-hvfw8 1/1 Running 0 1d
kube-system monitoring-influxdb-4110454889-7s5fj 1/1 Running 0 1d
kube-system tiller-deploy-737598192-jpggg 1/1 Running 0 1d
onap-aai aai-dmaap-522748218-5rw0v 1/1 Running 0 21m
onap-aai aai-kafka-2485280328-6264m 1/1 Running 0 21m
onap-aai aai-resources-3302599602-fn4xm 1/1 Running 0 21m
onap-aai aai-service-3321436576-2snd6 1/1 Running 0 21m
onap-aai aai-traversal-2747464563-3c8m7 1/1 Running 0 21m
onap-aai aai-zookeeper-1010977228-l2h3h 1/1 Running 0 21m
onap-aai data-router-1397019010-t60wm 1/1 Running 0 21m
onap-aai elasticsearch-2660384851-k4txd 1/1 Running 0 21m
onap-aai gremlin-1786175088-m39jb 1/1 Running 0 21m
onap-aai hbase-3880914143-vp8zk 1/1 Running 0 21m
onap-aai model-loader-service-226363973-wx6s3 1/1 Running 0 21m
onap-aai search-data-service-1212351515-q4k68 1/1 Running 0 21m
onap-aai sparky-be-2088640323-h2pbx 1/1 Running 0 21m
onap-appc appc-1972362106-4zqh8 1/1 Running 0 21m
onap-appc appc-dbhost-2280647936-s041d 1/1 Running 0 21m
onap-appc appc-dgbuilder-2616852186-g9sng 1/1 Running 0 21m
onap-message-router dmaap-3565545912-w5lp4 1/1 Running 0 21m
onap-message-router global-kafka-701218468-091rt 1/1 Running 0 21m
onap-message-router zookeeper-555686225-vdp8w 1/1 Running 0 21m
onap-mso mariadb-2814112212-zs7lk 1/1 Running 0 21m
onap-mso mso-2505152907-xdhmb 1/1 Running 0 21m
onap-policy brmsgw-362208961-ks6jb 1/1 Running 0 21m
onap-policy drools-3066421234-rbpr9 1/1 Running 0 21m
onap-policy mariadb-2520934092-3jcw3 1/1 Running 0 21m
onap-policy nexus-3248078429-4k29f 1/1 Running 0 21m
onap-policy pap-4199568361-p3h0p 1/1 Running 0 21m
onap-policy pdp-785329082-3c8m5 1/1 Running 0 21m
onap-policy pypdp-3381312488-q2z8t 1/1 Running 0 21m
onap-portal portalapps-2799319019-00qhb 1/1 Running 0 21m
onap-portal portaldb-1564561994-50mv0 1/1 Running 0 21m
onap-portal portalwidgets-1728801515-r825g 1/1 Running 0 21m
onap-portal vnc-portal-700404418-r61hm 1/1 Running 0 21m
onap-robot robot-349535534-lqsvp 1/1 Running 0 21m
onap-sdc sdc-be-1839962017-n3hx3 1/1 Running 0 21m
onap-sdc sdc-cs-2640808243-tc9ck 1/1 Running 0 21m
onap-sdc sdc-es-227943957-f6nfv 1/1 Running 0 21m
onap-sdc sdc-fe-3467675014-v8jxm 1/1 Running 0 21m
onap-sdc sdc-kb-1998598941-57nj1 1/1 Running 0 21m
onap-sdnc sdnc-250717546-xmrmw 1/1 Running 0 21m
onap-sdnc sdnc-dbhost-3807967487-tdr91 1/1 Running 0 21m
onap-sdnc sdnc-dgbuilder-3446959187-dn07m 1/1 Running 0 21m
onap-sdnc sdnc-portal-4253352894-hx9v8 1/1 Running 0 21m
onap-vid vid-mariadb-2932072366-n5qw1 1/1 Running 0 21m
onap-vid vid-server-377438368-kn6x4 1/1 Running 0 21m
root@ip-172-31-93-160:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces | grep 0/1
health passes except for to-be-merged dcae
root@ip-172-31-93-160:/dockerdata-nfs/onap/robot# ls
authorization demo-docker.sh demo-k8s.sh ete-docker.sh ete-k8s.sh eteshare robot
root@ip-172-31-93-160:/dockerdata-nfs/onap/robot# ./ete-docker.sh health
------------------------------------------------------------------------------
Basic SDNGC Health Check | PASS |
------------------------------------------------------------------------------
Basic A&AI Health Check | PASS |
------------------------------------------------------------------------------
Basic Policy Health Check | PASS |
------------------------------------------------------------------------------
Basic MSO Health Check | PASS |
------------------------------------------------------------------------------
Basic ASDC Health Check | PASS |
------------------------------------------------------------------------------
Basic APPC Health Check | PASS |
------------------------------------------------------------------------------
Basic Portal Health Check | PASS |
------------------------------------------------------------------------------
Basic Message Router Health Check | PASS |
------------------------------------------------------------------------------
Basic VID Health Check | PASS |
nagaraja sr
Has anyone managed to run ONAP on Kubernetes with more than one node? i'm unclear about how the /dockerdata-nfs volume mount works in the case of multiple nodes.
1) in my azure setup, i have one master node and 4 agent nodes (Standard D3 - 4CPU/ 14GB). after running the config-init pod (and completing) i do not see the /dockerdata-nfs directory being created on the master node. i am not sure how to check this directory on all the agent nodes. Is this directory expected to be created on all the agent nodes? if so, are they kept synchronized?
2) after the cluster is restarted/ there is a possibility that pods will run on different set of nodes, so if the /dockerdata-nfs is not kept in sync between the agent nodes, then the data will not be persisted.
ps: i did not use rancher. i created the k8s cluster using acs-engine.
Shane Daniel
Hi nagaraja,
The mounting of the shared dockerdata-nfs volume does not appear to happen automatically. You can install nfs-kernel-server and mount a shared drive manually. If you are running rancher on the master node (the one with the files in the /dockerdata-nfs directory, mount that directory to the agent nodes:
On Master:
# apt-get install nfs-kernel-server
Modify /etc/exports to share directory from master to agent nodes
# vi /etc/exports
#systemctl restart nfs-kernel-server
On client nodes:
#apt-get install nfs-common
delete existing data:
#rm -fr dockerdata-nfs/
#mkdir -p /dockerdata-nfs
#mount <master ip>:/dockerdata-nfs/ /dockerdata-nfs/
Cyril Nleng
Hi All,
I am trying to install ONAP on Kubernetes and I got the following error while trying to run ./createConfig.sh -n onap command:
sudo: unable to execute ./createConfig.sh: No such file or directory
Hangup
Does anyone have an idea? (kubernetes /helm is already up and running)
Thanks,
Borislav Glozman
Please check whether the file is not in DOS format. you might want to do dos2unix on it (and others)
Cyril Nleng
Thank you for your help. Indeed this was the cause of the problem.
Michael O'Brien
we need to 755 the file - it was committed with the wrong permissions to the 1.0.0 branch
OOM-218 - Getting issue details... STATUS
the instructions reference this.
% chmod
777
createConfig.sh (
1.0
branch only)
Cyril Nleng
Hi All,
I am trying to install ONAP on Kubernetes and I got the following error while trying to run ./createAll.bash -n onap -a robot|appc|aai command:
Command 'mppc' from package 'makepp' (universe)
Command 'ppc' from package 'pearpc' (universe)
appc: command not found
No command 'aai' found, did you mean:
Command 'axi' from package 'afnix' (universe)
Command 'ali' from package 'nmh' (universe)
Command 'ali' from package 'mailutils-mh' (universe)
Command 'aa' from package 'astronomical-almanac' (universe)
Command 'fai' from package 'fai-client' (universe)
Command 'cai' from package 'emboss' (universe)
aai: command not found
Does anyone have an idea? (kubernetes /helm is already up and running)
Thanks,
nagaraja sr
you need to run the commands for each onap command one by one.
i.e, ./createAll.bash -n onap -a robot
when that's completed,
./createAll.bash -n onap -a aai
./createAll.bash -n onap -a appc
and so on for each onap component you wish to install.
Cyril Nleng
Thanks for the help,
but right now it looks like Kubernetes is not able to pull an image from registry
kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
onap-robot robot-3494393958-8fl0q 0/1 ImagePullBackOff 0 5m
Do you have any idea why?
Michael O'Brien
There was an issue (happens periodically) with the nexus3 repo.
Also check that you are not having proxy issues.
Usually we post the ONAP partner we are with either via our email or on our profile - thank you in advance.
/michael
Cyril Nleng
Hi All,
I am trying to install ONAP on Kubernetes and I got the following behaviorwhile trying to run ./createAll.bash -n onap -a robot|appc|aai command:
but right now it looks like Kubernetes is not able to pull an image from registry
kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
onap-robot robot-3494393958-8fl0q 0/1 ImagePullBackOff 0 5m
Do you have any idea why?
Alex Lee
Hi, Michael O'Brien .I am trying to install ONAP through the way above and encountered a problem.
The pod of hbase in kubernetes returns to “Readiness probe failed: dial tcp 10.42.76.162:8020: getsockopt: connection refused”. It seems like the service of hbase is not started as expected.The container named hbase in Rancher logs:
Starting namenodes on [hbase]
hbase: chown: missing operand after '/opt/hadoop-2.7.2/logs'
hbase: Try 'chown --help' for more information.
hbase: starting namenode, logging to /opt/hadoop-2.7.2/logs/hadoop--namenode-hbase.out
localhost: starting datanode, logging to /opt/hadoop-2.7.2/logs/hadoop--datanode-hbase.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.2/logs/hadoop--secondarynamenode-hbase.out
starting zookeeper, logging to /opt/hbase-1.2.3/bin/../logs/hbase--zookeeper-hbase.out
starting master, logging to /opt/hbase-1.2.3/bin/../logs/hbase--master-hbase.out
starting regionserver, logging to /opt/hbase-1.2.3/bin/../logs/hbase--1-regionserver-hbase.out
Michael O'Brien
Nexus3 usually has intermittent connection issues - you may have to wait up until 30 min. Yesterday I was able to bring it up on 3 systems with the 20170906 tag (All outside the firewall)
I assume MSO (earlier in the startup) worked - so you don't have a proxy issue
/michael
Michael O'Brien
verified
root@ip-172-31-93-122:~/oom_20170908/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-q5vns 1/1 Running 3 12d
kube-system kube-dns-646531078-tzhbj 3/3 Running 6 12d
kube-system kubernetes-dashboard-716739405-zc56m 1/1 Running 3 12d
kube-system monitoring-grafana-3552275057-gwcv0 1/1 Running 3 12d
kube-system monitoring-influxdb-4110454889-m29w3 1/1 Running 3 12d
kube-system tiller-deploy-737598192-rndtq 1/1 Running 3 12d
onap config 0/1 Completed 0 10m
onap-aai aai-resources-3302599602-6mggg 1/1 Running 0 7m
onap-aai aai-service-3321436576-qc7tx 1/1 Running 0 7m
onap-aai aai-traversal-2747464563-bvbqn 1/1 Running 0 7m
onap-aai data-router-1397019010-d4bh1 1/1 Running 0 7m
onap-aai elasticsearch-2660384851-r9v3k 1/1 Running 0 7m
onap-aai gremlin-1786175088-q5z1k 1/1 Running 1 7m
onap-aai hbase-3880914143-0nn8x 1/1 Running 0 7m
onap-aai model-loader-service-226363973-2wr0k 1/1 Running 0 7m
onap-aai search-data-service-1212351515-b04rz 1/1 Running 0 7m
onap-aai sparky-be-2088640323-kg4ts 1/1 Running 0 7m
onap-appc appc-1972362106-j27bp 1/1 Running 0 7m
onap-appc appc-dbhost-4156477017-13mhs 1/1 Running 0 7m
onap-appc appc-dgbuilder-2616852186-4rtxz 1/1 Running 0 7m
onap-message-router dmaap-3565545912-nqcs1 1/1 Running 0 8m
onap-message-router global-kafka-3548877108-x4gqb 1/1 Running 0 8m
onap-message-router zookeeper-2697330950-6l8ht 1/1 Running 0 8m
onap-mso mariadb-2019543522-1jc0v 1/1 Running 0 8m
onap-mso mso-2505152907-cj74x 1/1 Running 0 8m
onap-policy brmsgw-3913376880-5v5p4 1/1 Running 0 7m
onap-policy drools-873246297-1h059 1/1 Running 0 7m
onap-policy mariadb-922099840-qbpj7 1/1 Running 0 7m
onap-policy nexus-2268491532-pqt8t 1/1 Running 0 7m
onap-policy pap-1694585402-7mdtg 1/1 Running 0 7m
onap-policy pdp-3638368335-zptqk 1/1 Running 0 7m
onap-portal portalapps-2799319019-twhn2 1/1 Running 0 8m
onap-portal portaldb-2714869748-bt1c8 1/1 Running 0 8m
onap-portal portalwidgets-1728801515-gr616 1/1 Running 0 8m
onap-portal vnc-portal-1920917086-s9mj9 1/1 Running 0 8m
onap-robot robot-1085296500-jkkln 1/1 Running 0 8m
onap-sdc sdc-be-1839962017-nh4bm 1/1 Running 0 7m
onap-sdc sdc-cs-428962321-hhnmk 1/1 Running 0 7m
onap-sdc sdc-es-227943957-mrnng 1/1 Running 0 7m
onap-sdc sdc-fe-3467675014-nq72v 1/1 Running 0 7m
onap-sdc sdc-kb-1998598941-2bd73 1/1 Running 0 7m
onap-sdnc sdnc-250717546-0dtr7 1/1 Running 0 8m
onap-sdnc sdnc-dbhost-2348786256-96gvr 1/1 Running 0 8m
onap-sdnc sdnc-dgbuilder-3446959187-9993t 1/1 Running 0 8m
onap-sdnc sdnc-portal-4253352894-sd7mg 1/1 Running 0 8m
onap-vid vid-mariadb-2940400992-mmtbn 1/1 Running 0 8m
onap-vid vid-server-377438368-z3tfv 1/1 Running 0 8m
From: onap-discuss-bounces@lists.onap.org [mailto:onap-discuss-bounces@lists.onap.org] On Behalf Of Mandeep Khinda
Sent: Friday, September 8, 2017 14:36
To: onap-discuss@lists.onap.org
Subject: [onap-discuss] [oom] config pod changes
OOM users,
I’ve just pushed a change that requires a re-build of the /dockerdata-nfs/onap/ mount on your K8s host.
Basically, what I’ve tried to do is port over the heat stack version of ONAPs configuration mechanism. The heat way of running ONAP writes files to /opt/config/ based on the stack’s environment file that has the details related to each users environment. These values are then swapped in to the various VMs containers using scripts.
Now that we are using helm for OOM, I was able to do something similar in order to start trying to run the vFW/vLB demo use cases.
This story tracks the functionality that was needed: https://jira.onap.org/browse/OOM-277
I have also been made aware that this change requires K8s 1.6 as I am making use of the “envFrom” https://kubernetes.io/docs/api-reference/v1.6/#container-v1-core. We stated earlier that we are setting minimum requirements of K8s 1.7 and rancher 1.6 for OOM so hopefully this isn’t a big issue.
It boils down to this:
/oom/kubernetes/config/onap-parameters.yaml is kind of like file “onap_openstackRC.env” and you will need to define some required values otherwise the config pod deployment will fail.
A sample can be found here:
/oom/kubernetes/config/onap-parameters-sample.yaml
Note: If you don’t care about interacting with openstack to launch VNFs then, you can just use the sample file contents.
continue to run createConfig.sh –n onap and it will install the config files and swap in your environment specific values before it completes.
createAll.bash –n onap to recreate your ONAP K8s environment and go from there.
Thx,
Mandeep
--
Liang Ke
Hi, ALL
1? I am trying to install ONAP on Kubernetes and encountered a problem.
I create msb pods first by command "./createAll.bash -n onap -a msb", then
create aai pods by command "/createAll.bash -n onap -a aai".
The problem is that all serviceName and url of aai do not register to msb as expected.
I find the code of aai project has those lines "
msb.onap.org/service-info: '[
so I think msb can not support domain name right now?
2? Also three of aai pods can not be created normally.
Sathvik Manoj
Hi all,
Goal: I want to deploy and manage vFirewall router using ONAP.
I installed ONAP on Kubernetes using oom(release-1.0.0). All Services are running except DCAE as it is not yet completely implemented in Kubernetes. Also, I have an OpenStack cluster configured separately.
How can I integrate DCAE to the above Kubernetes cluster?
Thanks,
Sathvik M
Michael O'Brien
DCAE is still coming in (1.0 version in 1.1) - this component is an order of magnitude more complex than any other ONAP deployment - you can track
https://jira.onap.org/browse/OOM-176
Michael O'Brien
DCAE is in OOM Kubernetes as of 20170913
onap-dcae cdap0-4078069992-ql1fk 1/1 Running 0 41m
onap-dcae cdap1-4039904165-r8f2v 1/1 Running 0 41m
onap-dcae cdap2-422364317-827g3 1/1 Running 0 41m
onap-dcae dcae-collector-common-event-1149898616-1f8vt 1/1 Running 0 41m
onap-dcae dcae-collector-dmaapbc-1520987080-9drlt 1/1 Running 0 41m
onap-dcae dcae-controller-2121147148-1kd7f 1/1 Running 0 41m
onap-dcae dcae-pgaas-2006588677-0wlf1 1/1 Running 0 41m
onap-dcae dmaap-1927146826-6wt83 1/1 Running 0 41m
onap-dcae kafka-2590900334-29qsk 1/1 Running 0 41m
onap-dcae zookeeper-2166102094-4jgw0 1/1 Running 0 41m
Sathvik Manoj
That means DCAE is working... Is it available in 1.0 version of OOM or 1.1?
Thanks,
Sathvik M
Sathvik Manoj
Hi Michael,
As DCAE is available in OOM 1.1v, I started installtion of 1.1v. Out of 10 containers of A&AI 2 of them are not coming up.
RepetedlyI am seeing below prints in
Can some one help me in fixing this issue.
Thanks,
Sathvik M
Vidhu Shekhar Pandey
Hi Michael,
I am using OOM 1.1.0 version. I have pre pulled all the images using the prepull_docker.sh. But after creating the pods using createAll.sh script all the pods are coming up except DCAE. Is DCAE supported in 1.1.0 release? If not then when is it expected to be functional? Will I be able to run the vFW demo close loop without DCAE?
More details below:
The DCAE specific images shown are:
root@hcl:~# docker images | grep dcae
nexus3.onap.org:10001/openecomp/dcae-controller 1.1-STAGING-latest ff839a80b8f1 12 weeks ago 694.6 MB
nexus3.onap.org:10001/openecomp/dcae-collector-common-event 1.1-STAGING-latest e3daaf41111b 12 weeks ago 537.3 MB
nexus3.onap.org:10001/openecomp/dcae-dmaapbc 1.1-STAGING-latest 1fcf5b48d63b 7 months ago 328.1 MB
The DCAE health check is failing
Starting Xvfb on display :88 with res 1280x1024x24
Executing robot tests at log level TRACE
==============================================================================
OpenECOMP ETE
==============================================================================
OpenECOMP ETE.Robot
==============================================================================
OpenECOMP ETE.Robot.Testsuites
==============================================================================
OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp components are...
==============================================================================
Basic DCAE Health Check | FAIL |
ConnectionError: HTTPConnectionPool(host='dcae-controller.onap-dcae', port=8080): Max retries exceeded with url: /healthcheck (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f26aee31550>: Failed to establish a new connection: [Errno -2] Name or service not known',))
------------------------------------------------------------------------------
Basic SDNGC Health Check | PASS |
------------------------------------------------------------------------------
Basic A&AI Health Check | PASS |
Thanks,
Vidhu
Michael O'Brien
Vidhu, hi, DCAE was in in 1.0 of OOM on 28 Sept 2017 - however for R1/Amsterdam the new project DCAEGEN2 was only done in HEAT. There is an effort to move the containers to Kubernetes, an effort to use the developer setup with 1 instead of 7 cdap hadoop nodes and an effort to complete the bridge between the hybrid HEAT/Kubernetes setup - specific only to DCAEGEN2. One or more of these should be in shortly as we work the DCAE team. You are welcome to help both teams with this large effort.
thank you
/michael
I Chen
Hi Michael,
Just curious, is DCAEGEN2 now available?
While oneclick/createAll.bash includes DCAEGEN2 pod creation, the automation script cd.sh hits the ERROR condition when creating DCAEGEN2 because createAll.bash expect /home/ubuntu/.ssh/onap_rsa to exist. Here's some output from one of today's Jenkin's run console log (http://jenkins.onap.info/job/oom-cd/1853/consoleFull):
Michael O'Brien
Yes, DCAEGEN2 works via OOM- I verified it last friday. However only in the amsterdam release with the proper onap-parameters.yaml (will be ported to Beijing/master shortly).
see details on
https://lists.onap.org/pipermail/onap-discuss/2018-February/008059.html
The CD jenkins job is running master for now - where DCAEGEN2 is expected not to work yet.
try amsterdam.
/michael
Mor Dabastany
Hi,
I sense that there is a bit lack of information here. which, I would be happy to acquire.
There is a file that describes the onap environment, "onap-parameters.yaml". I think that it will good practice to provide data on how to fill it (or acquire the values that should be resides in it).
Michael O'Brien, any available document about it?
Michael O'Brien
Mor, You are welcome to help us finish the documentation for OOM-277
The config was changed on friday - those us here are playing catch up on some of the infrastructure changes as we are testing the deploys every couple days - you are welcome to add to the documentation here - usually the first to encounter an issue/workaround documents it - so the rest of us can benefit.
Most of the content on this tutorial is added by developers like yourself that would like to get OOM deployed and fully functional - at ONAP we self document anything that is missing
OOM-277 - Getting issue details... STATUS
There was a section added on friday for those switching from the old-style config to the new - you run a helm purge
The configuration parameters will be specific to your rackspace/openstack config - usually you match your rc export. There is a sample posted from before when it was in the json file in mso - see the screen cap.
The major issue is than so far no one using pure public ONAP has actually deployed a vFirewall yet (mostly due to stability issues with ONAP that are being fixed)
./michael
Michael O'Brien
TODO
good to go : 20170913:2200h
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 37m
onap-aai aai-service-3321436576-790w2 0/1 Init:0/1 1 34m
onap-aai aai-traversal-2747464563-pb8ns 0/1 Running 0 34m
onap-appc appc-dgbuilder-2616852186-htwkl 0/1 Running 0 35m
onap-dcae dmaap-1927146826-6wt83 0/1 Running 0 34m
onap-policy brmsgw-3913376880-qznzv 0/1 Init:0/1 1 35m
onap-policy drools-873246297-twxtq 0/1 Init:0/1 1 35m
onap-policy pap-1694585402-hwkdk 0/1 PodInitializing 0 35m
onap-policy pdp-3638368335-l00br 0/1 Init:0/1 1 35m
onap-portal vnc-portal-1920917086-0q786 0/1 Init:1/5 1 35m
onap-sdc sdc-be-1839962017-16zc3 0/1 Init:0/2 1 34m
onap-sdc sdc-fe-3467675014-qp7f5 0/1 Init:0/1 1 34m
onap-sdc sdc-kb-1998598941-6z0w2 0/1 PodInitializing 0 34m
onap-sdnc sdnc-dgbuilder-3446959187-lspd6 0/1 Running 0 35m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 39m
onap-policy brmsgw-3913376880-qznzv 0/1 Init:0/1 1 36m
onap-policy drools-873246297-twxtq 0/1 Init:0/1 1 36m
onap-policy pdp-3638368335-l00br 0/1 PodInitializing 0 36m
onap-portal vnc-portal-1920917086-0q786 0/1 Init:2/5 1 36m
onap-sdc sdc-be-1839962017-16zc3 0/1 PodInitializing 0 36m
onap-sdc sdc-fe-3467675014-qp7f5 0/1 Init:0/1 1 36m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 40m
onap-policy drools-873246297-twxtq 0/1 PodInitializing 0 38m
onap-portal vnc-portal-1920917086-0q786 0/1 Init:2/5 1 38m
onap-sdc sdc-fe-3467675014-qp7f5 0/1 Running 0 38m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 41m
onap-policy drools-873246297-twxtq 0/1 PodInitializing 0 39m
onap-portal vnc-portal-1920917086-0q786 0/1 Init:3/5 1 39m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 42m
onap-policy drools-873246297-twxtq 0/1 Running 0 40m
onap-portal vnc-portal-1920917086-0q786 0/1 PodInitializing 0 40m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 42m
onap-portal vnc-portal-1920917086-0q786 0/1 PodInitializing 0 40m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 43m
onap-portal vnc-portal-1920917086-0q786 0/1 PodInitializing 0 40m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 43m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-7212s 1/1 Running 1 1d
kube-system kube-dns-2514474280-lmr1k 3/3 Running 3 1d
kube-system kubernetes-dashboard-716739405-qfjgd 1/1 Running 1 1d
kube-system monitoring-grafana-3552275057-gj3x8 1/1 Running 1 1d
kube-system monitoring-influxdb-4110454889-2dq44 1/1 Running 1 1d
kube-system tiller-deploy-737598192-46l1m 1/1 Running 2 1d
onap-aai aai-resources-3302599602-c894z 1/1 Running 0 41m
onap-aai aai-service-3321436576-790w2 1/1 Running 0 41m
onap-aai aai-traversal-2747464563-pb8ns 1/1 Running 0 41m
onap-aai data-router-1397019010-fwqmz 1/1 Running 0 41m
onap-aai elasticsearch-2660384851-chf2n 1/1 Running 0 41m
onap-aai gremlin-1786175088-smqgx 1/1 Running 0 41m
onap-aai hbase-3880914143-9cksj 1/1 Running 0 41m
onap-aai model-loader-service-226363973-nlcnm 1/1 Running 0 41m
onap-aai search-data-service-1212351515-5wkb2 1/1 Running 0 41m
onap-aai sparky-be-2088640323-xs1dg 1/1 Running 0 41m
onap-appc appc-1972362106-lx2t0 1/1 Running 0 41m
onap-appc appc-dbhost-4156477017-9vbf9 1/1 Running 0 41m
onap-appc appc-dgbuilder-2616852186-htwkl 1/1 Running 0 41m
onap-dcae cdap0-4078069992-ql1fk 1/1 Running 0 41m
onap-dcae cdap1-4039904165-r8f2v 1/1 Running 0 41m
onap-dcae cdap2-422364317-827g3 1/1 Running 0 41m
onap-dcae dcae-collector-common-event-1149898616-1f8vt 1/1 Running 0 41m
onap-dcae dcae-collector-dmaapbc-1520987080-9drlt 1/1 Running 0 41m
onap-dcae dcae-controller-2121147148-1kd7f 1/1 Running 0 41m
onap-dcae dcae-pgaas-2006588677-0wlf1 1/1 Running 0 41m
onap-dcae dmaap-1927146826-6wt83 1/1 Running 0 41m
onap-dcae kafka-2590900334-29qsk 1/1 Running 0 41m
onap-dcae zookeeper-2166102094-4jgw0 1/1 Running 0 41m
onap-message-router dmaap-3565545912-2f19k 1/1 Running 0 41m
onap-message-router global-kafka-3548877108-ns5v6 1/1 Running 0 41m
onap-message-router zookeeper-2697330950-9fbmf 1/1 Running 0 41m
onap-mso mariadb-2019543522-nqqbz 1/1 Running 0 41m
onap-mso mso-2505152907-pg17g 1/1 Running 0 41m
onap-policy brmsgw-3913376880-qznzv 1/1 Running 0 41m
onap-policy drools-873246297-twxtq 1/1 Running 0 41m
onap-policy mariadb-922099840-x5xsq 1/1 Running 0 41m
onap-policy nexus-2268491532-025jf 1/1 Running 0 41m
onap-policy pap-1694585402-hwkdk 1/1 Running 0 41m
onap-policy pdp-3638368335-l00br 1/1 Running 0 41m
onap-portal portalapps-3572242008-qr51z 1/1 Running 0 41m
onap-portal portaldb-2714869748-wxtvh 1/1 Running 0 41m
onap-portal portalwidgets-1728801515-33bm7 1/1 Running 0 41m
onap-portal vnc-portal-1920917086-0q786 1/1 Running 0 41m
onap-robot robot-1085296500-d3l2g 1/1 Running 0 41m
onap-sdc sdc-be-1839962017-16zc3 1/1 Running 0 41m
onap-sdc sdc-cs-428962321-z87js 1/1 Running 0 41m
onap-sdc sdc-es-227943957-5ssh3 1/1 Running 0 41m
onap-sdc sdc-fe-3467675014-qp7f5 1/1 Running 0 41m
onap-sdc sdc-kb-1998598941-6z0w2 1/1 Running 0 41m
onap-sdnc sdnc-250717546-476sv 1/1 Running 0 41m
onap-sdnc sdnc-dbhost-2348786256-wsf9z 1/1 Running 0 41m
onap-sdnc sdnc-dgbuilder-3446959187-lspd6 1/1 Running 0 41m
onap-sdnc sdnc-portal-4253352894-73mzq 1/1 Running 0 41m
onap-vid vid-mariadb-2940400992-twp1r 1/1 Running 0 41m
onap-vid vid-server-377438368-mkgpc 1/1 Running 0 41m
Cyril Nleng
HI,
I just went through instalaltion tutorial :
1 - I am wondering how Openstack impact ONAP operations ?
2 - when will dcae component be available on kubernetes ?
Thanks,
Michael O'Brien
DCAE is in OOM Kubernetes as of 20170913
onap-dcae cdap0-4078069992-ql1fk 1/1 Running 0 41m
onap-dcae cdap1-4039904165-r8f2v 1/1 Running 0 41m
onap-dcae cdap2-422364317-827g3 1/1 Running 0 41m
onap-dcae dcae-collector-common-event-1149898616-1f8vt 1/1 Running 0 41m
onap-dcae dcae-collector-dmaapbc-1520987080-9drlt 1/1 Running 0 41m
onap-dcae dcae-controller-2121147148-1kd7f 1/1 Running 0 41m
onap-dcae dcae-pgaas-2006588677-0wlf1 1/1 Running 0 41m
onap-dcae dmaap-1927146826-6wt83 1/1 Running 0 41m
onap-dcae kafka-2590900334-29qsk 1/1 Running 0 41m
onap-dcae zookeeper-2166102094-4jgw0 1/1 Running 0 41m
Cyril Nleng
Those changes are available in which branch ?
Kiran Kamineni
Is there any reason for using the 8880 port instead of the 8080 port when installing Rancher?
8880 port seems to be blocked in our environment and using 8080 was working fine. I hope I will not run into other issues because I am using 8080?
Borislav Glozman
Kiran Kamineni, You can use whatever port you prefer. It should cause no issues.
Mohamed Aly ould Oumar
Hi, I managed to install all ONAP components using Kubernates, they seem to be running and I can access the Portl and authenticate,
Problem:
I can not access the SDC, It always gives the error "Sorry, you are not authorized to view this page, contact ...the administrators".
I tried with all the available users (demo, cs0008, jh0003) but none of them is working.
Can I get few bits of help regarding this?
Thanks in advance.
Mohamed Aly, Aalto University.
Borislav Glozman
Please try accessing it from the VNC. (<your node IP>:30211).
Shane Daniel
I am having the same issue as Mohamed. I am accessing it via the VNC portal on port 30211
Mohamed Aly ould Oumar
Hi, thank you for your reply, I'm accessing it from the VNC node with the port 30211, it doesn't work though and gives the same error.
Any update on this issue??
Mike Elliott
First verify that your portal containers are running in K8s (including the vnc-portal). Make notice of the 2/2 and 1/1 Ready states. If a 0 is on the left of those numbers then the container is not fully running.
kubectl get pods --all-namespaces -o=wide
onap-portal portalapps-4168271938-gllr1 2/2 Running
onap-portal portaldb-2821262885-rs4qj 2/2 Running
onap-portal portalwidgets-1837229812-r8cn2 1/1 Running
onap-portal vnc-portal-2366268378-c71z9 1/1 Running
If the containers (pods) are in a good state, ensure your k8s host has a routable IP address and substitute it into the example URL below:
http://<ip address>:30211/vnc.html?autoconnect=1&autoscale=0&quality=3
Mohamed Aly ould Oumar
This is not our problem, thanx anyway.
Mandeep Singh
I am also facing the same issue.
From the wireshark logs, GET /sdc2/rest/version api is having some issues.
Pods seem to be running fine ::
onap1-portal portaldb-3931461499-x03wg 2/2 Running 0 1h
onap1-portal portalwidgets-3077832546-jz647 1/1 Running 0 1h
onap1-portal vnc-portal-3037811218-hj3wj 1/1 Running 0 1h
onap1-sdc sdc-be-3901137770-h7d65 2/2 Running 0 1h
onap1-sdc sdc-cs-372240393-kqlw7 1/1 Running 0 1h
onap1-sdc sdc-es-140478562-r1fx9 1/1 Running 0 1h
onap1-sdc sdc-fe-3405834798-pjvkh 2/2 Running 0 1h
onap1-sdc sdc-kb-3782380369-hzb6q 1/1 Running 0 1h
Mohamed Aly ould Oumar
Any update on this issue??
Samuel Robillard
Hi, is there a page available where we could find any sort of updated list/diagram of the dependencies between the different onap components? Also is there a breakdown of the memory requirements for the various oom components?
Mike Elliott
Hi Samuel,
No official documentation on the dependencies at this point. But a very good idea to add. I will look into doing this.
For now you can see the dependencies in each of the deployment descriptors like in the AAI traversal example (see below) that depends on aai-resource and hbase containers before it starts up. In OOM we make use of Kubernetes init-containers and readiness probes to implement the dependencies. This prevents the main container in the deployment descriptor from starting until its dependencies are "ready".
oom/kubernetes/aai/templates] vi aai-traversal-deployment.yaml
pod.beta.kubernetes.io/init-containers: '[
{
"args": [
"--container-name",
"hbase",
"--container-name",
"aai-resources"
],
"command": [
"/root/ready.py"
],
"env": [
{
"name": "NAMESPACE",
"valueFrom": {
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.namespace"
}
}
}
],
"image": "{{ .Values.image.readiness }}",
"imagePullPolicy": "{{ .Values.pullPolicy }}",
"name": "aai-traversal-readiness"
}
]'
Michael O'Brien
Samuel, To add to the dependency discussion by Mike - Ideally I would like to continue the deployment diagram below with the dependencies listed in the yamls he refers to
The diagram can be edited by anyone - I will take time this week and update it.
Overall Deployment Architecture#Version1.1.0/R1
/michael
Gopinath Taget
There seems to be an error in the VFC service definition template when creating all services on an ubuntu 16.04 with 64 GB RAM:
Creating namespace **********
namespace "onap-vfc" created
Creating registry secret **********
secret "onap-docker-registry-key" created
Creating deployments and services **********
Error: yaml: line 27: found unexpected end of stream
The command helm returned with error code 1
Michael O'Brien
Gopinath,
Hi, VFC is still a work in progress - the VFC team is working through issues with their containers. You don't currently need VFC for ONAP to function - you can comment it out of the oneclick/setenv.bash helm line (ideally we would leave out services that are still WIP).
thank you
/michael
Gopinath Taget
Thanks Michael O'Brien!
Gopinath Taget
Hi Michael,
Checking back to see if VFC container issues are resolved and I can continue with the full install including other components?
Thanks!
Gopinath
Rajesh Mangal
Hi,
I am trying to bring up ONAP using Kubernets. Can you tell please if I should pull only OOM release-1.0.0 or a pull from master branch should also be fine, to get the ONAP up & running and also to run demo on it.
Thanks!
Michael O'Brien
Rajesh, Hi, the latest master is 1.1/R1 - the wiki is now targeting 1.1 - I'll remove the 1.0 link. Be aware that ONAP in general is undergoing stabilization at this point.
/michael
Samuel Robillard
Hi,
I am getting the same error as a few people above when it comes to accessing SDC where it says I am not authorized to view this page, and it also gives me a 500 error. My initial impression is that this might be because I cannot reach the IP corresponding to the sdc.api.simpledemo.openecomp.org in the /etc/hosts file from my vnc container.
Could anybody confirm if this may cause an issue? And if so, which container/host/service IP should be paired with the sdc url?
Thanks,
Sam
Samuel Robillard
Actually, I believe the resolution is correct, as it maps to the sdc-fe service, and if I change the IP to any other service the sdc web page times out. Also, if I curl<sdc-url>:8080 I do get information back. I am still not sure what might be causing this issue. Currently I am trying to look through the sdc logs for hints, but no luck as of yet
Samuel Robillard
The request is failing on the sdc-fe side. I posted the outputs of a tcpdump from the sdc-fe container here https://pastebin.com/bA46vqUk
Michael O'Brien
There are general SDC issues - I'll look them up and paste them. We are also investigating issues with the sdc-be container
see
SDC-451 - Getting issue details... STATUS
and
INT-106 - Getting issue details... STATUS
Syed Atif Husain
is there a workaround for this issue of accessing SDC where it says I am not authorized to view this page?
kowsalya v
I am also facing same SDC issue duo to sdc-es is not ready.
sdc-es shows the below error in log.
7-10-11T17:49:50+05:30] INFO: HTTP Request Returned 404 Not Found: Object not found: chefzero://localhost:8889/environments/AUTO
10/11/2017 5:49:50 PM
10/11/2017 5:49:50 PM================================================================================
10/11/2017 5:49:50 PMError expanding the run_list:
10/11/2017 5:49:50 PM================================================================================
10/11/2017 5:49:50 PMUnexpected API Request Failure:
10/11/2017 5:49:50 PM-------------------------------
10/11/2017 5:49:50 PMObject not found: chefzero://localhost:8889/environments/AUTO
10/11/2017 5:49:50 PMPlatform:
10/11/2017 5:49:50 PM---------
10/11/2017 5:49:50 PMx86_64-linux
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] ERROR: Running exception handlers
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] ERROR: Exception handlers complete
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] FATAL: Stacktrace dumped to /root/chef-solo/cache/chef-stacktrace.out
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] ERROR: 404 "Not Found"
10/11/2017 5:49:51 PM[2017-10-11T17:49:50+05:30] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: Started chef-zero at chefzero://localhost:8889 with repository at /root/chef-solo
10/11/2017 5:49:52 PM One version per cookbook
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: Forking chef instance to converge...
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: *** Chef 12.19.36 ***
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: Platform: x86_64-linux
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: Chef-client pid: 927
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] INFO: Setting the run_list to ["role[elasticsearch]"] from CLI options
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] WARN: Run List override has been provided.
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] WARN: Original Run List: [role[elasticsearch]]
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] WARN: Overridden Run List: [recipe[sdc-elasticsearch::ES_6_create_kibana_dashboard_virtualization]]
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] INFO: HTTP Request Returned 404 Not Found: Object not found: chefzero://localhost:8889/environments/AUTO
10/11/2017 5:49:53 PM
Michael O'Brien
R1 is still under RC0 fix mode as we prep for the release - pull yesterdays (13th)
Mandeeps
https://gerrit.onap.org/r/#/c/18803/
fixes
OOM-359 - Getting issue details... STATUS
SDC-451 - Getting issue details... STATUS
and some of
OOM-110 - Getting issue details... STATUS
actually those are for sdc-be, I see a chef error on sdc-es - but the pod starts up ok (need to verify the endpoints though) - also this pod is not slated for the elk filebeat sister container - it should
getting a chef exit on missing elk components in sdd-es - even though this one is not slated for the sister filebeat container - likely a reused script across all pods in sdc - will take a look
see original OOM-110 commit
https://gerrit.onap.org/r/#/c/15941/1
Likely we can ignore this one in sdc-es - need to check endpoints though - pod comes up ok - regardless of the failed cookbook.
root@obriensystemsu0:~/onap/oom/kubernetes/oneclick# kubectl logs -f -n onap-sdc sdc-es-2514443912-nt3r3
/michael
Michael O'Brien
todo add to devops
oot@obriensystemsu0:~/onap/oom/kubernetes/oneclick# kubectl logs -f -n onap-aai aai-traversal-3982333463-vb89g aai-traversalCloning into 'aai-config'...
[2017-10-14T10:50:36-05:00] INFO: Started chef-zero at chefzero://localhost:1 with repository at /var/chef/aai-config
One version per cookbook
environments at /var/chef/aai-data/environments
[2017-10-14T10:50:36-05:00] INFO: Forking chef instance to converge...
Starting Chef Client, version 13.4.24
[2017-10-14T10:50:36-05:00] INFO: *** Chef 13.4.24 ***
[2017-10-14T10:50:36-05:00] INFO: Platform: x86_64-linux
[2017-10-14T10:50:36-05:00] INFO: Chef-client pid: 43
[
Vijendra Rajput
Hi Michael,
I am trying to setup ONAP using Kubernetes. I am using rancher to setup Kubernetes cluster. i am having 5 machine with 16GB memory each. Configured kubernentes successfully. when i am running createAll.bash to setup ONAP application, some of the components are successfully configured and running but some of the components are failing and with "ImagePullOfBack" error.
when i am trying to pull images independently i am able to download images from nexus successfully but not when running through createAll script. When i went through the script seem everything fine and not able to understand what is wrong. could you please help me understand the issue.
~Vijendra
Michael O'Brien
Vijendra,
Hi, try running the docker pre pull script on all of your machines first. Also you may need to duplicate /dockerdata-nfs across all machines - manually or via a shared drive.
/michael
Samuel Robillard
Hi,
I started getting an error with the MSO when I redeployed yesterday
Starting Xvfb on display :88 with res 1280x1024x24
Executing robot tests at log level TRACE
==============================================================================
OpenECOMP ETE
==============================================================================
OpenECOMP ETE.Robot
==============================================================================
OpenECOMP ETE.Robot.Testsuites
==============================================================================
.
.
.
------------------------------------------------------------------------------
Basic SDNGC Health Check | PASS |
------------------------------------------------------------------------------
Basic A&AI Health Check | PASS |
------------------------------------------------------------------------------
Basic Policy Health Check | PASS |
------------------------------------------------------------------------------
Basic MSO Health Check | FAIL |
503 != 200
------------------------------------------------------------------------------
Basic ASDC Health Check | PASS |
------------------------------------------------------------------------------
Basic APPC Health Check | PASS |
------------------------------------------------------------------------------
Basic Portal Health Check | PASS |
------------------------------------------------------------------------------
Basic Message Router Health Check | PASS |
------------------------------------------------------------------------------
Basic VID Health Check | PASS |
------------------------------------------------------------------------------
Basic Microservice Bus Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp compo... | FAIL |
11 critical tests, 8 passed, 3 failed
11 tests total, 8 passed, 3 failed
==============================================================================
OpenECOMP ETE.Robot.Testsuites | FAIL |
11 critical tests, 8 passed, 3 failed
11 tests total, 8 passed, 3 failed
==============================================================================
OpenECOMP ETE.Robot | FAIL |
11 critical tests, 8 passed, 3 failed
11 tests total, 8 passed, 3 failed
==============================================================================
OpenECOMP ETE | FAIL |
11 critical tests, 8 passed, 3 failed
11 tests total, 8 passed, 3 failed
==============================================================================
Output: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_11572/output.xml
Log: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_11572/log.html
Anybody else get this error/may know how to determine the root cause of this?
Michael O'Brien
Yes, we have been getting this since last friday - I have been too busy to raise an issue like normal - this is not as simple as onap-parameters.xml it looks like a robot change related to the SO rename - will post a JIRA/workaround shortly. Anyway SO is not fully up on OOM/Heat anyway currently.
20171019 - see the same thing on rackspace today
Also - nice dependency diagram you started.
/michael
Edmund Haselwanter
same here. health check is failing. seeing this is OOM as well as heat_openstack. SO-246 - Getting issue details... STATUS
Radhika Kaslikar
Hi ,
I have brought up ONAP using OOM master branch which I have pulled yesterday.But on running health check I am facing similar issues as discussed above where MSO fails with 503 error, and I also see portal failing with 404 error.
Can you please let us know if there is any workaround for this issue or is there any build where the necessary components for running vFW/vDNS demos like portal,SDC,AAI,SO,VID,SDNC,Policy and DCAE are healthy.
Thanks,
Radhika
Michael O'Brien
MSO, APPC, SDNC, Policy regularly pass/fail on a daily basis - as we are in branch stabilization mode for R1 - join the triage party below
INT-106 - Getting issue details... STATUS
/michael
Edmund Haselwanter
Michael O'Brien
of course, but in the spirit of "open" source - everything has access - hence 777 everywhere - until production deployments that is!
Edmund Haselwanter
how do I set/correct the missing values in the health check? How do I know if everything should be working with a current deployment?
Rahul Sharma
For the MSO Basic HealthCheck failure, see if the last comment in this JIRA helps: https://jira.onap.org/browse/SO-208?focusedCommentId=15724&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15724
Michael O'Brien
MSO passed for 24 hours last tue - it was a good day! I predict more of these next week
stay tuned to channel 106 - INT-106 - Getting issue details... STATUS
Radhika Kaslikar
Hi ,
On running health check, MSO is still failing
Basic MSO Health Check | FAIL |
503 != 200
And on checking the MSO container logs I see the following error :
2017-11-06 19:14:21,766||ServerService Thread Pool -- 75|contextInitialized||||ERROR|AvailabilityError|| MSO-RA-5210E Configuration error:Unknown. MSO Properties failed to initialize completely
2017-11-06 19:14:21,786||ServerService Thread Pool -- 75|contextInitialized||||ERROR|AvailabilityError|| MSO-GENERAL-9400E Exception: java.lang.NullPointerException - at org.openecomp.mso.openstack.utils.CloudConfigInitializer.contextInitialized(CloudConfigInitializer.java:65) -
Can anyone please tell me how can I solve this.
Before running the health check all the pods were in running state.
Beili Zhou
Michael O'Brien:
In the Delete/Rerunconfig-initcontainerfor/dockerdata-nfsrefresh section, the steps for delete fs is as the following
This would not be good for the case of cluster configure while the directory `dockerdata-nfs` is mounted as per suggestion from ClusterConfiguration(optional-donotuseifyourserver/clientareco-located):
The the description in OOM-257 (DevOps: OOM config reset procedure for new /dockerdata-nfs content) is more friendly
, where the step is described as
Michael O'Brien
A persistent NFS mount is recommended in the official docs - this is a collaborative wiki - as in join the party of overly enthusiastic developers - in my case I run on AWS EBS so not an issue - you are welcome to help document the ecosystem.
The sky at OOM is a very nice shade of blue!
Sorry I am super excited about the upcoming developer conference on 11 Dec.
/michael
ramki krishnan
Hi Michael,
In my setup, I am able to start the ONAP components only if all the images already are downloaded using prepull_docker.sh. So far, I have been able to start all aai components using "createAll.bash -n onap -a aai" after the images have been downloaded using prepull_docker.sh.
Here are the challenges I am facing
onap-clamp clamp-2925721051-g814q 0/1 CrashLoopBackOff 144 12h
onap-consul consul-agent-3312409084-lwdvv 0/1 CrashLoopBackOff 162 13h
onap-consul consul-server-1173049560-mk40v 0/1 CrashLoopBackOff 163 13h
onap-consul consul-server-1173049560-pjpm5 0/1 CrashLoopBackOff 163 13h
onap-consul consul-server-1173049560-rf257 0/1 CrashLoopBackOff 163 13h
onap-vfc vfc-workflow-2530549902-19tw0 0/1 CrashLoopBackOff 166 13h
Your suggestions on next steps are much appreciated.
Thanks,
Ramki
Beili Zhou
@ramki krishnan
You can use the following command to check out the logs of why the pod is failed with `CrashLoopBackOff`:
In your case, the command would be:
ramki krishnan
Thanks Beili. Below is the error I get for clamp. Looks like clamp is expecting some configuration, specifically password. Any clues on the specific configuration which needs to be updated?
***************************
APPLICATION FAILED TO START
***************************
Description:
Binding to target org.onap.clamp.clds.config.EncodedPasswordBasicDataSource@53ec2968 failed:
Property: spring.datasource.camunda.password
Value: strong_pitchou
Reason: Property 'password' threw exception; nested exception is java.lang.NumberFormatException: For input string: "st"
Action:
Update your application's configuration
Michael O'Brien
Use the recommended subset (essentially ONAP 1.0 components from the original seed code in Feb 2017 - these work with the vFirewall use case - until we stabilize the R1 release.
Clamp, aaf, and vfc are currently still being developed - there are usually 2 to pod failures in these components - I will post the JIRAs. - these are known issues and being worked on in the OOM JIRA board.
https://jira.onap.org/secure/RapidBoard.jspa?rapidView=41&view=planning&selectedIssue=OOM-150
OOM-333 - Getting issue details... STATUS
OOM-324 - Getting issue details... STATUS
OOM-408 - Getting issue details... STATUS
You don't need these 3 components to run the vFirewall - for now I would exclude them in HELM_APPS in setenv.bash - later when they are stable you can add them back.
ramki krishnan
Many thanks Michael.
Alex Lee
Hi, @Michael O'Brien. As we can see in https://git.onap.org/integration/tree/version-manifest/src/main/resources/docker-manifest.csv,
all the tag of docker images is changing to R1 release. But now, the images for OOM/master is still with the tag: 1.1-STAGING-latest
Michael O'Brien
Yes, been thinking about this for some time - and I have seen issues where we don't pick up problems we should have with for example the openecomp to onap refactor earlier this week - As you know from the TSC meeting yesterday - the manifest is still in flux in the move to the dockerhub versions
OOM-432 - Getting issue details... STATUS
OOM-438 - Getting issue details... STATUS
Alex, can't tell your company from your email - you are welcome in the Wed 10EDT OOM meeting where we can farm out work items like this.
thank you
/michael
Alex Lee
Thanks for your explanations Michael O'Brien.
Another question, when docker images for the Amsterdam release is ready, the docker repo for ONAP is still nexus3 at onap?
Because in
you are moving some images to the new repo called onap.
Michael O'Brien
I am not sure yet - but I would expect that master continues to pull from nexus/nexus3, and the R1 branch pulls from dockerhub - but need to verify - put a watch on the JIRA - I usually update them with critical info/links/status
/michael
Alex Lee
ok. thanks a lot, michael
Michael O'Brien
Stay with helm v2.3 - do not upgrade to 2.6 or vnc-portal will fail - see OOM-441 - Getting issue details... STATUS
Syed Atif Husain
I have successfully start onap on kubernetes with below apps in setenv.sh. All pods show 1/1 running, but when I login to portal I only SDC. Why are the other modules not appearing in portal?
HELM_APPS=('consul' 'msb' 'mso' 'message-router' 'sdnc' 'vid' 'robot' 'portal' 'policy' 'appc' 'aai' 'sdc' 'log' 'cli' 'multicloud' 'clamp' 'vnfsdk' 'uui' 'aaf' 'vfc' 'kube
2msb')
Rahul Sharma
Syed Atif Husain: Are you logged on as demo user?
Syed Atif Husain
Rahul Sharma I tried cs0008, the catalog designer role
Rahul Sharma
Syed Atif Husain: That would only show SDC. Try using demo/demo123456!
Syed Atif Husain
Thanks Rahul Sharma. I have encountered another issue, SDC keeps giving me 500 error saying you are authorized to view this page, when I login as cs0008. I see in comments above that this is a known issue. Is there a workaround for this or can I pull older/stable code to avoid this?
tuan nguyen
This is a great accomplishment for us to start playing with- thanks a lot Amar and Prakash for your effort putting things together. One thing I mentioned earlier in the call, we probably need to review and upgrade not using Docker 1.12 (2 years old) where Docker now moving away to 1.13 last year now Docker CE (public) and Docker EE (Enterprise) where number starting with Docker 1.17.x (2017=1.17, 2018, 1.18). Also Rancher is not mandatory just to build Kubernetes only as I met several customers using in production where we can build Kubernetes 1.6, 1.7 or 1.8 quite easy now using Kubeadm in few minutes (skipping Rancher). I meant Rancher is good for other usecases where customers need multi orchestrator environment (K8s, Mesos, Swarm). I don't see real value for Rancher to be here in our ONAP document where it might be confusing people that Rancher is mandatory just for bringing up K8s. Another thing, I was attending last Docker conference, Kubernetes will soon support Containerd in which CLI command to be running will be "crictl" not "kubectl" anymore, allowing Kubernetes to be working directly with Containerd, thus improving performance for Kubernetes where ONAP will be fully taking benefif of (GA will be end of 2017). We probably need to closely follow what Kubernetes community is heading to so accordingly update our documentation. Kind of difficult to update our documentation every month but keep up with Kubernetes is a good way to catch in my opinion...
Michael O'Brien
Good discussion,
I agree - we will move from docker 1.12 when we move from Rancher 1.6.10 to Rancher 2.0 - where we can use 1.17.x - but it is a Rancher + Docker + Kubernetes config issue.
Rancher is not required - we tried minikube, there are also commercial CaaS frameworks - however Rancher is the simplest and fastest approach at the moment.
You are welcome to join the OOM call at 10AM EDT on Wed - we usually go through the JIRA board - and the Kubeadm work sounds like a good Epic to work on. We are very interested in various environments and alternatives for running our pods - please join.
There is also a daily OOM blitz on stabilizing the branch and deploying the vFirewall use case that you are welcome to attend
1200EDT noon until either the 4th Dec KubeCon or the 11 dec ONAP developer conference.
https://lists.onap.org/pipermail/onap-discuss/2017-November/006483.html
I updated the page to state that Rancher is just "one" way to get your pods up - add any subpages for other types of frameworks as you wish.
/michael
tuan nguyen
great job Michael- hope we can have more and more from people trying ONAP and giving more feedback from people too and great contributors like you!
Sen Shu
Hi all.
I have a question.
In the page of installation using HEAT,
v CPU needs 148, but this page discribes
64 v CPU needed.
why these has differences so much.
are there differences of items that can be installed?
best regards
sen
Michael O'Brien
Sen,
Good question, as you know CPU can be over-provisioned - threads will just queue more, unlike RAM and HD which cannot be shared. 64 vCPUs is a recommended # of vCPUs based on bringing up the system on 64 and 128 core systems on AWS - we top out at 44 cores during startup (without DCAE - so this may be multiplied by 3/2 in that case as DCAE has 1/3 the containers in ONAP). Therefore for non-staging/non-production systems you will not gain anything having more that 44 vCores until we start hammering the system with real world VNF traffic. The HEAT provisioning is a result of the fact that the docker allocation model is across multiple silo VMs and not flat like in Kubernetes currently. Therefore some servers may only use 1/8 where others may peak at 7/8. It all depends on how you use onap.
You can get away during development with 8 vCores - ONAP will startup in 11m instead of 7 on 32 vCores.
Since DCAE is not currently in Kubernetes in R1 - then you need to account for it only in openstack.
Depending on the VNF use case you don't need the whole system yet, for example the vFW only needs 1.0.0. era components, where vVolte and vCPE will need new R1 components - see the HELM_APPS recommendation in this wiki.
Similar ONAP HEAT deployment (without DCAE or the OPEN-O VM - triple the size in that case) - this will run the vFirewall but not to closed-loop.
/michael
Sen Shu
michael,
thank you for your answering my question.
It's make me easier to understand.
I'll use HEAT installation and allocate tempolarily 148 v CPU because of need to use DCAE.
I'll also see the page you referenced.
thanks
Sen
Joey Sullivan
I was getting the following error when running "./createConfig.sh -n onap"
There was something wrong with helm tiller rbac config.
I found the solution here
https://github.com/kubernetes/helm/issues/3130
https://docs.bitnami.com/kubernetes/how-to/configure-rbac-in-your-kubernetes-cluster/
This is what I did to fix the issues in my deployment.
Vaibhav Chopra
1) Whenever you delete the configuration with
# helm delete --purge onap-config
release
"onap-config"
deleted
It deletes the config pod, You do need to delete the namespace as well for complete cleanup:-
kubectl delete namespace onap
2) Another observation is with Kubectl version :-
Currently below command is installing the latest version 1.8.4
curl -LO https:
//storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
To download a specific version, replace the
$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)
portion of the command with the specific version.I think , the difference in both version is about the init container, due to which in v1.8.3, it waits for the dependent container to come up due to which some time the dependent container gets timed out for me like vnc-portal.
such as drools checking for brmsgw to become up:-
2017-11-27 08:16:46,757 - INFO - brmsgw is not ready.
2017-11-27 08:16:51,759 - INFO - Checking if brmsgw is ready
2017-11-27 08:16:51,826 - INFO - brmsgw is not ready.
2017-11-27 08:16:56,831 - INFO - Checking if brmsgw is ready
2017-11-27 08:16:56,877 - INFO - brmsgw is ready!
Michael O'Brien
Your 1.8.4 vs 1.8.3 version observation is good - we have issues with vnc-portal under the latest 1.8.8 - will look more into this - thank you
see OOM-441 - Getting issue details... STATUS if you would like to comment
/michael
ATUL ANGRISH
HI Michael,
I am trying to configure and deploy ONAP components using kubernetes but after doing this when i run below mentioned command to check the pods status,
kubectl get pods --all-namespaces
There is a problem in SDC,VNC component. They are not going to be in up state.
onap-sdc sdc-be-754421819-b696x 1/2 ErrImagePull 0 1h
onap-sdc sdc-fe-902103934-qmf3g 0/2 Init:0/1 4 1h
onap-portal vnc-portal-3680188324-kjszk 0/1 Init:2/5 3 1h
I have used 1.12 docker version along with 1.6.10 Rancher and 2.3 Helm.
I guess something changes in the chef scripts. I dont know the reason.
When i describe the pod of SDC-be I am getting this error
Normal Started 24m kubelet, k8s-2 Started container
Normal Pulling 21m kubelet, k8s-2 pulling image "docker.elastic.co/beats/filebeat:5.5.0"
Warning Failed 21m kubelet, k8s-2 Failed to pull image "nexus3.onap.org:10001/openecomp/sdc-backend:1.1-STAGING-latest": rpc error: code = 2 desc = net/http: request canceled
Could you please help me on that.
ATUL ANGRISH
HI
We are facing an issue while deploying pods mainly SDC using Kubernetes.
root@k8s-2:/# kubectl get pods --all-namespaces -a
onap-aai aai-resources-898583818-6ptc4 2/2 Running 0 1h
onap-aai aai-service-749944520-0jhxf 1/1 Running 0 1h
onap-mso mariadb-829081257-vx3n1 1/1 Running 0 1h
onap-mso mso-821928192-qp6tn 2/2 Running 0 1h
onap-sdc sdc-be-754421819-phch8 0/2 PodInitializing 0 1h
onap-sdc sdc-cs-2937804434-qn1q6 1/1 Running 0 1h
onap-sdc sdc-es-2514443912-c7fmd 1/1 Running 0 1h
onap-sdc sdc-fe-902103934-rlbhv 0/2 Init:0/1 8 1h
When we see the logs of this container we can see that there are issues.
Please find below the steps to check the logs:-
1) Run kubectl command to check the pods status.
kubectl get pods --all-namespaces –a
onap-mso mso-821928192-qp6tn 2/2 Running 0 1h
onap-sdc sdc-be-754421819-phch8 0/2 PodInitializing 0 1h
onap-sdc sdc-cs-2937804434-qn1q6 1/1 Running 0 1h
onap-sdc sdc-es-2514443912-c7fmd 1/1 Running 0 1h
2) Using docker ps –a command to list the containers.
root@k8s-2:/# docker ps -a | grep sdc-be
347b4da64d9c nexus3.onap.org:10001/openecomp/sdc-backend@sha256:d4007e41988fd0bd451b8400144b27c60b4ba0a2e54fca1a02356d8b5ec3ac0d "/root/startup.sh" 53 minutes ago Up 53 minutes k8s_sdc-be_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_0
2b4cf42b163a oomk8s/readiness-check@sha256:ab8a4a13e39535d67f110a618312bb2971b9a291c99392ef91415743b6a25ecb "/root/ready.py --con" 57 minutes ago Exited (0) 53 minutes ago k8s_sdc-dmaap-readiness_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_3
a066ef35890b oomk8s/readiness-check@sha256:ab8a4a13e39535d67f110a618312bb2971b9a291c99392ef91415743b6a25ecb "/root/ready.py --con" About an hour ago Exited (0) About an hour ago k8s_sdc-be-readiness_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_0
1fdc79e399fd gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df
3) Use this command to see the docker logs
Docker logs 347b4da64d9c | grep err/exceptions
4) Observe the error logs and exceptions.
Currently we are getting below mentioned exceptions:
Recipe Compile Error in /root/chef-solo/cache/cookbooks/sdc-catalog-be/recipes/BE_2_setup_configuration
2017-12-06T11:53:48+00:00] ERROR: bash[upgrade-normatives] (sdc-normatives::upgrade_Normatives line 7) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.openecomp.sdcrests.health.rest.services.HealthCheckImpl]: Constructor threw exception; nested exception is java.lang.ExceptionInInitializerError
We are following below mentioned link for configuration.
https://wiki.onap.org/display/DW/ONAP+on+Kubernetes
We did the cleanup and reinstall multiple times but got the same issue again and again.
Regards
Atul
Brian Freeman
Installing on Azure - other than the network security groups via portal.azure.com screenshots seemed to go okay up to running cd.sh.
You need to number the steps since sometimes its not obvious when you are switching to a new task vs describing some future or optional part. Had to be careful to not blindly copy/paste since you have multiple versions in the steps some with notes like "
# below
20171119
- still verifying -
do
not use" which was confusing. The video has the steps which is good but its tedious to start/stop the video and then look at the next step in the wiki. I will update when it completes.
Do we need to add port 10250 to the security groups ? I got error messages on cd.sh (but admittedly I didnt watch that part of the video)
Brian Freeman
It didnt come up cleanly but perhaps I didnt wait long enough for something.
I did notice that the SDNC containers for dmaaplistener and ueblistener didnt get loaded and SDNC stayed in Init
root@ONAP-OOM1:~# kubectl get pods --all-namespaces | grep sdnc
onap-sdnc sdnc-1395102659-z7r64 0/2 Init:0/1 0 23m
onap-sdnc sdnc-dbhost-3029711096-2s7mg 0/1 ContainerCreating 0 23m
onap-sdnc sdnc-dgbuilder-4267203648-pf5xk 0/1 Init:0/1 0 23m
onap-sdnc sdnc-portal-2558294154-xn94v 0/1 Init:0/1 0 23m
onap-vfc vfc-ztesdncdriver-1452986549-rjj18 1/1 Running 0 23m
Michael O'Brien
Recorded the latest video from clean EC2 VM - install, run cd.sh - 24m to 87 pods up - running healthcheck now - willl post video in 1 hour
Yes, some images may not be in the prepull that is currently hardcoded - so a couple pods take a while
in general vnc-portal and then aai-service are the last to come up
/michael
Brian Freeman
Azure VMs seem to only have a 30GB OS disk. I can add a data disk but I think I should run the install from someplace other than root. Is that simple to change in cd.sh ?
Brian Freeman
Was able to complete bringing up ONAP on Azure through health check except for dcae
A few things missing:
Had to add a data disk to the Ubuntu VM.
Michael O'Brien
Yes forgot to point out the requirements on this page - you will need 70G to install ONAP and it will run up to 90 over a week (mostly logs)
Curious about Azure filesystems - I had issues with non-EBS in the past - will recheck.
Will raise a JIRA on the HEAT to OOM sync for SDNC
OOM-491 - Getting issue details... STATUS
for SDC - would raise a JIRA but I don't see the sanity container in HEAT - I see the same 5 containers in both
Brian Freeman
check SB01 in Windriver there is a sanity-check container that runs for their self-health check. I think its only needed for trouble shooting
You can see it in nexus
Brian Freeman
You did point out the disk size requirements in the video. The issue is really that AWS makes that a setting at VM create and Azure you have to separately create the data disk (or at least I couldn't find a way to do it on the original create via the portal)
Michael O'Brien
yes I see the docker image - just wondering where the docker container is in the SDC vm - my stack is 2 days old
I only see the front/back end ones, cassandra, elasticsearch and kibana - 5
let us know and I will raise a JIRA for SDC like we did for SDNC
wait -when I get back -'ll check the compose file - perhaps it is optional - ok I see it in /data/scripts/docker_run.sh
docker run --detach --name sdc-sanity --env HOST_IP=${IP} --env ENVNAME="${DEP_ENV}" --env http_proxy=${http_proxy} --env https_proxy=${https_proxy} --env no_proxy=${no_proxy} --log-driver=json-file --log-opt max-size=100m --l
it is optional
docker run sdc-sanity
if [ ${RUNTESTS} = true ]; then
but we should run it
raising JIRA to add this optional container
OOM-492 - Getting issue details... STATUS
thanks
/michael
Michael O'Brien
BTW, thanks Brian for the review - when I started I brought up HEAT in May 2017 and enumerated all the containers to get a feel - we should have done another pass on all the vms - but without someone who would know the optional ones like in SDC we would have missed the sdc-sanity one - thanks
/michael
Michael O'Brien
You can run the scripts from anywhere - I usually run as ubuntu not root - the reason the rancher script is root is because you would need to log out back in to pick up the docker user config for ubuntu.
I run either directly in /home/ubunutu or /root
The cloned directory will put oom in either of these
For ports - yes try to open everything - on AWS I run with an all open CIDR security group for ease of access - on Rackspace the VM would need individual port opennings
/michael
Michael O'Brien
Yes, the multiple steps are confusing - trying to help out a 2nd team that is working using Helm 2.7 to use the tpl function - I'll remove those until they are stable
thanks
/michael
Michael O'Brien
Updated wiki - thought I removed all helm 2.6/2.7 - i was keeping the instructions on aligning the server and client until we fix the vnc-portal issue under helm 2.6 - this wiki gets modified a lot as we move through all the rancher/helm/kubernetes/docker version
Michael Phillip
Hi, I'm new to ONAP and cloud computing in general, but trying to work through the above guide. I'm at the point where I'm waiting for the onap pods to come up. Most have come up, but some seem to be stuck after 2 hrs. I'm wondering if perhaps I have insufficient memory available. I'm installing on a KVM VM with 16 vCPU, 55G RAM and 220G HD.
One thought is to shutdown the VM, increase RAM to about 60G and restart, but I'm uncertain as to the pontential implications. Any suggestions as to how I could proceed would be greatly appreciated.
Thanks,
Michael
James MacNider
Hi Michael Phillip,
Unless you've taken the step to remove some components from the HELM_APPS variable in the setenv.bash script (after the oom repository was cloned), you very likely require 64 GB of RAM.
I've successfully deployed a subset of the components in a 48GB RAM VM with HELM_APPS set to this:
HELM_APPS=('mso' 'message-router' 'sdnc' 'vid' 'robot' 'portal' 'policy' 'appc' 'aai' 'sdc' 'log')
Michael Phillip
Thanks alot James. I have 72G on my host, but would like to leave room for additional VM's, like vFirewall. So I'll try removing some components as you suggested. Will give me an opportunity to try the clean up
Thanks again,
Michael
ATUL ANGRISH
Hi Michael,
We tried to up sdc pod in my setup but we are not able to make it up.
onap-sdc sdc-be-754421819-phch8 0/2 PodInitializing 0 1h
onap-sdc sdc-cs-2937804434-qn1q6 1/1 Running 0 1h
onap-sdc sdc-es-2514443912-c7fmd 1/1 Running 0 1h
onap-sdc sdc-fe-902103934-rlbhv 0/2 Init:0/1 8 1h
I think there is something changed in prepull_docker script.
We tried to prepull_docker script using
# from OOM-
328
- pulls in sequence
# For branch
"release-1.1.0"
:
curl https:
//jira.onap.org/secure/attachment/10741/prepull_docker_110.sh > prepull_docker.sh
Anyone who will try to install/deploy ONAP SDC container , will get an issue in SDC pod come up issue.
Exceptions:-
Recipe Compile Error in /root/chef-solo/cache/cookbooks/sdc-catalog-be/recipes/BE_2_setup_configuration
2017-12-06T11:53:48+00:00] ERROR: bash[upgrade-normatives] (sdc-normatives::upgrade_Normatives line 7) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.openecomp.sdcrests.health.rest.services.HealthCheckImpl]: Constructor threw exception; nested exception is java.lang.ExceptionInInitializerError
Regards
Atul
Michael O'Brien
Correct, looks like a standard spring bean startup error -specific to SDC -which should also be failing in the HEAT deployment - I tested last night release-1.1.0 to test a merge in oom and all my pods are up except the known aaf - also the CD job is OK
http://jenkins.onap.info/job/oom-cd/621/console
Build #621 (7-Dec-2017 2:00:00 PM)
this bothers me though - as I hope we are not missing something that only yourself sees - will look more into it - you are using 1.1.0 or master (master may have issues)
see parallel discussion last night
https://lists.onap.org/pipermail/onap-discuss/2017-December/006800.html
Also are you bringing up anything - as if you check the yaml there are dependencies
In your onap-discuss post last night - you did not have the dependent pods up - did this fix the issue - I quickly looked at the code and the HealhCheckImpl class is doing healthchecks - which would fail I would expect on dependent pods not up
thank you
/michael
Brian Freeman
testsuite (robot) is an older version 1.1-STAGING-latest. How would I upgrade just testsuite to 1.2-STAGING:latest ?
It only loads Demonstration customer not SDN-ETHERNET-INTERNET needed for vCPE
Alexis de Talhouët
Easiest way is to go the the Kubernetes UI, then under the onap-robot namespace, click on the Deployments tab, then click the three dots next to the deployment to update (in this case, robot), it will pop up a window where you can edit, among everything deployment parameters, the image version. Then click update. This will bounce the deployment (hence the pod), and will create a new deployment with the changes.
Brian Freeman
SDNC org.ops4j.pax.logging.cfg isnt the same as the file in gerrit. I noticed there is a different file in dockerdata-nfs/onap/log/sdnc that appears to come from the OOM repo instead of the CCSDK repo (same OOM file looks to be used for appc). Why isnt the SDNC logging configuration being used ?
Alexis de Talhouët
What you're mentioning, Brian, is the major issue we currently have in OOM:
we need to fork projects' config in order to adjust to kubernetes context, whether it's for address resolution, or for logging. I'll let Michael O'Brien explained what was done for the logs. But the overall purpose wrt logging is to centralized them and have them browsable through a Kibana interface (using logstash).
Regarding the address resolution, well, kubernetes provide it's own way of resolving services within namespaces, <service>.<namespace>:<internal-port>. Because of this, everywhere in the config where there is some network config we change it to levrage k8s networking.
Michael O'Brien
Brian, yes there is a centralized logging configuration that has the RI in the logging-analytics repo - this ELK stack available on the onap-log kibana container internal port 5601 uses a filebeat container (all the 2/2 pods) to pipe the logs in through a set of PV's using the emptyDir directive in the yaml. A logging spec is being worked out.
Logging User Guide#Quickstart-gettingyourELKDashboardup
I'll update this more shortly.
Brian Freeman
Well the logging team needs to find a solution for the heavy user of the local logs where we turn on DEBUG/TRACE and generate huge amount of log entries while we step through the DG processing. The SDNC logging.cfg also creates the per DG files of data. I guess I can simply replace the file in dockerdata-nfs with the version I can use for support but it seems like we need a better solution that can fit both needs. Can't the logging.cfg support both the common onap logs and the SDNC specific DEBUG logging in the /opt/opendaylight/current/data/log directory ?
ATUL ANGRISH
HI Michael
I am using release 1.1.0. It was working till Monday 4th Dec and then after that we clean up everything and redeploy the pods again to test something in my environment.
The after that SDC-be and SDC-fe never comes up. We tried this on 2-3 more setups but problem still persist.
I suspect that there is a problem in prepull_docker.sh script is not able to pull images which we currently required for SDC.
/ATUL/oom/kubernetes/sdc/values.yaml
sdcBackend: nexus3.onap.org:10001/openecomp/sdc-backend:1.1-STAGING-latest
sdcFrontend: nexus3.onap.org:10001/openecomp/sdc-frontend:1.1-STAGING-latest
As you can see all my nodes are up except SDC-be and SDC-fe
root@k8s-2:/# kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-km9jg 1/1 Running 8 2h
kube-system kube-dns-638003847-z8gnh 3/3 Running 23 2h
kube-system kubernetes-dashboard-716739405-xn4hx 1/1 Running 7 2h
kube-system monitoring-grafana-2360823841-fsznx 1/1 Running 7 2h
kube-system monitoring-influxdb-2323019309-qks0t 1/1 Running 7 2h
kube-system tiller-deploy-737598192-wlhmk 1/1 Running 7 2h
onap config 0/1 Completed 0 1h
onap-aai aai-resources-898583818-6ptc4 2/2 Running 0 1h
onap-aai aai-service-749944520-0jhxf 1/1 Running 0 1h
onap-mso mariadb-829081257-vx3n1 1/1 Running 0 1h
onap-mso mso-821928192-qp6tn 2/2 Running 0 1h
onap-sdc sdc-be-754421819-phch8 0/2 PodInitializing 0 1h
onap-sdc sdc-cs-2937804434-qn1q6 1/1 Running 0 1h
onap-sdc sdc-es-2514443912-c7fmd 1/1 Running 0 1h
onap-sdc sdc-fe-902103934-rlbhv 0/2 Init:0/1 8 1h
onap-sdc sdc-kb-281446026-tvg8r 1/1 Running 0 1h
Thanks
Atul
Michael O'Brien
Atul,
Hi you saw my previous comment on the dependent pods for SDC - do you have those up
http://jenkins.onap.info/job/oom-cd/
I am bringing up a clean release-1.1.0 environment to record an SDC video for another issue - so I will verify this again.
Anyway the healthcheck on the CD server is OK - the only difference is that the images are cached there right now - so on the off chance that the images were removed or not available via nexus3 - this will be seen on a clean EC2 server shortly. ( a real CD server that brings up a clean VM every time is in the works)
/michael
Michael O'Brien
In master (I am also testing a patch) - I get the following (ignore aaf) in master
could be an image issue (different images in 1.1.0 and master) - or a config issue that has not been cherry picked to master yet (we are running the reverse), note portal depends on sdc - sdc is the issue
Make sure you use release-1.1.0 - as this is our stable branch right now
Michael O'Brien
Atul,
See separate mail on onap-discuss - we are stabilizing master - doing the last of Alexis de Talhouët cherry picks from stable release-1.1.0 - then SDC and AAI should come up
I recommend running a full set of pods in release-1.1.0 for now - you can also assist in testing master once the merges are in so we can declare it open for pending feature commits
thank you
/michael
Michael O'Brien
Atul hi, thanks for the effort helping us stablilize - Alexis de Talhouët and the AAI team have fixed the 2 aai-service and aai-traversal issue that popup up 10am friday on release-1.1.0 - you can use that branch again.
OOM-501 - Getting issue details... STATUS
/michael
ATUL ANGRISH
Hi Michael,
Are you going to clean and rebuild release 1.1.0 for prepull_docker images?
Is there any alternative to proceed ?
I have again tried release 1.1.0 today in order to up my all ONAP components especially (AAI and SDC as well).But i am facing the same issue. My SDC component is not going to be up
Regards
Atul
Michael O'Brien
Atul, hi,
There is no issue with the prepull - it is just a script that greps the docker image tags for all values.yaml - v1.1.0 in most cases.
If you run cd.sh at the top of the page - it will clean your environment and upgrade it - or checkout the commands it you want to do it yourself. There is no issue with the release-1.1.0 branch (besides a single not-required aaf container) - the
release-1.1.0 is stable as of 20171208:2300 EDT
As a check can you cover off each of the steps if you don't use the automated deploy script
(delete all pods, delete your config pod, remove dockerdata-nfs, source setenv.sh (make sure your onap-parameters.yaml is ok), create config, wait for it, (prepull is optional - it just speeds things up) , create pods, run healthcheck, PUT cloud-region to AAI ...
Remember we have not had an answer yet on your config - sdc will not come up unless dependent pods are up - for example - just try to run everything to start - then fine tune a subtree of pods later.
please try the following script - it is running on the hourly CD server and 3-4 other environments OK
https://github.com/obrienlabs/onap-root/blob/master/cd.sh
Also verify you are running v1.6.10 of rancher with helm 2.3 on both the server and client - and kubectl 1.8+
thank you
/michael
Michael O'Brien
Issue was that my cd.sh was not accounting for dirs other than /root and /home/ubuntu - because of a cd ~/
Fixed the script - thanks Atul and we are good
Sen Shu
Hi.
now,I try to deploy onap on aws with using kubernetes.
then,is it able to install onap component to separated VM?
for example, aaf's one pod install to a 64gvm,
then install another aaf's pod to 32g VM.
and another question,namespace in kubernetes equall VM in HEAT? like aaf vm,aai vm..in diagram.
could you please tell me about that.
bestregards
sen
Michael O'Brien
Sen,
HI, there is a video at the top of this page where I bring up an R4 instance all the way to healthcheck on a single VM
https://wiki.onap.org/download/attachments/8227431/20171206_oom_e2e_aws_install_to_healthcheck.mp4?version=1&modificationDate=1512608899000&api=v2
Yes it is possible to run as many hosts as you like - this is the recommendation for a scalable/resilient system - there is a link to the SDNC initiative above - essentially you need to share the /dockerdata-nfs directory.
SDN-C Clustering on Kubernetes
3. Share the /dockerdata-nfs Folder between Kubernetes Nodes
For your question about affinity - yes you can assign pods to a specific host - but kubernetes will distribute the load automatically and handle any failures for you - but if you want to change this you can edit the yaml either on the checked out repo - or live in the Kubernetes console.
There is the global namespace example "onap" then the pod/component namespace "aai, aaf" - they combine as onap-aai - so the closest the HEAT VM model would be to equate the pod namespace - however a pod like onap-aai could have HA containers where individual containers like aai-resources have 2 copies split across hosts - also parts of a pod could be split like aai-resources on one host and aai-service on another. the global namespace allows you to bring up several deployments of ONAP on the same kubernetes cluster - separated by namespace prefix and port assignment (300xx, 310xxx for example)
Vidhu Shekhar Pandey
Hello,
I have installed ONAP on Kubernetes on a single host machine following the manual instructions
Now I am trying to run the vFW demo in my setup. I am facing an error when I am onboarding the vFW-vSINK VSP using the SDC portal. The error occurs during the asset creation process after the VSP is imported into the catalog. Here is the error, also attaching the screenshot
Error code SVC4614
Status code 400
invalid content Group type org.openecomp.groups.heat.HeatStack does not exist
To give a back ground of the processes followed:
I installed Kubernetes and Rancher. Kubernetes environment was created using Rancher portal and it showed healthy state.
onap_parameter.yaml file was edited according to my OpenStack setup running on a separate host.
Configuration was generated using
cd oom/kubernetes/config
./createConfig.sh -n onap
Helm APPS exported are
HELM_APPS=('mso'
'message-router'
'sdnc'
'vid'
'robot'
'portal'
'policy'
'appc'
'aai'
'sdc'
'log')
I was are able to bring up the ONAP containers individually one by one using the script
./createAll.bash -n onap -a XXX (for all Helm apps exported above )
I logged into the ONAP vnc portal and then logged on to SDC portal as designer (cs00008/demo123456!) to onboard the vFW demo VNF.
I created a new VLM which was checked in and submitted successfully
Then created the VSP vFW-vSINK and was able to upload the vFvSINK.zip yaml files, check in and submit the VSP successfully.
Importing this VSP in the catalog went fine but it was while creating the asset that I got this error.
Can someone help and suggest the possible cause?
Thanks,
Vidhu
Alexis de Talhouët
Hi Vidhu, which OOM branch have you used. You must use release-1.1.0 for now. Thanks
Vidhu Shekhar Pandey
Hi Alexis
Thanks for the information. Yes I am using release-1.1.0. In fact I re-created the PODS once again and the error got resolved. Now I have reached to a stage where I am able to create and distribute the vFW-vSINK services.
Regards,
Vidhu
pranjal sharma
Hi Vidhu,
how did you recreated the pods?
Thanks
Pranjal
Alan Chang
Dear all,
I use cd.sh to deploy ONAP in my environment. I always get 500 error code of the robot test of SDC (the same error http://jenkins.onap.info/job/oom-cd/690/console).
I have checked the logs in the sdc-be and got the following error.
Is there anyone know how to solve this problem? Looking forward your reply.
Blessings
Alan JW Chang
Alan Chang
Dear all,
I solve this problem by reinstall the whole system beginning with rancher. Thanks a lot.
Blessings
Alan
Michael O'Brien
Alan, Hi, there are a couple components that fail healthcheck for up to 15 min after the readiness pod marks them as up - the liveness probe needs to be adjusted and the teams need to provide a better /healthcheck url
Unfortunately you experienced this.
SDC-739 - Getting issue details... STATUS
As you can see from the graph - the failures are essentially random every hour - even though the CD server runs 3 and waits about 6 min
Kibana CD Dashboard
Beka Tsotsoria
SDC healthchecks fail constantly. Even in the CI build history there is a failure in every build output I checked. Also this graph shows different results now:
Kibana
Even if I wait more than 15 minutes, still no luck. What could be the workaround, any ideas?
UPDATE: I was finally able to get rid of SDC healthcheck failures by reinstalling only SDC several times:
However now I have following failures:
pranjal sharma
Hi Beka,
Are you able to resolve the above usecaseui-gui api health check issue. Since i am facing the same issue , it would be great if you have any workaround on this issue
Thanks
Pranjal
Beka Tsotsoria
Hello Pranjal,
No use usecaseui-gui still fails even in the jenkins: http://jenkins.onap.info/job/oom-cd/2123/console. I have not reached to the point where I will need these failing services, maybe for most of the use cases they are not needed at all.
Beka
David Perez Caparros
Hi,
Regarding the usecase-gui health check issue, try the following in robot container:
sed -i 's/usecaseui/usecase-ui/g' /var/opt/OpenECOMP_ETE/robot/testsuites/health-check.robot
That solved the issue for me.
David
pranjal sharma
Hello All,
I was able to create/deploy the vFirewall package (packet generator, sinc and firewall vnf)on openstack cloud.
But i couldnt able to login into any of vnf's vm.
After when i debug i see i didnt change the default public key with our local public key pair in the PACKET GENERATOR curl jason UI.
Now i am deploying the VNF again (same Vfirewall Package) on the openstack cloud, thought of giving our local public key in both pg and sinc json api's.
I have queries for clarifications :
- how can we create a VNF package manually/dynamically using SDC component (so that we have leverage of get into the VNF vm and access the capability of the same)
- And I want to implement the Service Function chaining for the deployed Vfirewall, please do let me know how to proceed with that.
PS: I have installed/Deployed ONAP using rancher on kubernetes (on openstack cloud platform) without DACE component so i haven't had leverage of using the Closed Loop Automation.
Any thoughts will be helpful for us.
Thanks,
Pranjal
shubhra garg
Hi All,
Could you please let me know the significance of the CURL command as mentioned in the cd.sh ( the automated script )
The CURL query present in cd.sh ( the automated script to install ONAP pods ) is failing.
It has three parameters :
1. json file ( not sure whether we are supposed to use the same file as specified by ONAP community or we need to fill in our openstack details ). I have tried both.
2. a certification file named aaiapisimpledemoopenecomporg_20171003.crt ( which has NOT been attached alongwith the cd.sh script or specified anywhere else )
3. There is a änother header ( -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" ). If I use this header, the script is faling. I have removed this header, then PUT succeed but GET fails.
I am NOT sure of the significance of the below mentioned curl command in cd.sh file. I was just doing the vfirewall onboarding, that time I noticed that this CURL command is required.
Moreover, the robot scripts ( both ./demo-k8s.sh init_robot and ./demo-k8s.sh init ) are failing.
The init_robot is failing : though we have entered the test as password but the http is not taking it.
The init testcase is failing giving me 401 error for the authorization.
Could you please help! Thanks in advance!
cd.sh snippet :
echo "run partial vFW"
echo "curl with aai cert to cloud-region PUT"
curl -X PUT https://127.0.0.1:30233/aai/v11/cloud-infrastructure/cloud-regions/cloud-region/CloudOwner/RegionOne --data "@aai-cloud-region-put.json" -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" -H "X-TransactionId:jimmy-postman" -H "X-FromAppId:AAI" -H "Content-Type:application/json" -H "Accept:application/json" --cacert aaiapisimpledemoopenecomporg_20171003.crt -k
echo "get the cloud region back"
curl -X GET https://127.0.0.1:30233/aai/v11/cloud-infrastructure/cloud-regions/ -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" -H "X-TransactionId:jimmy-postman" -H "X-FromAppId:AAI" -H "Content-Type:application/json" -H "Accept:application/json" --cacert aaiapisimpledemoopenecomporg_20171003.crt -k
sudo chmod 777 /dockerdata-nfs/onap
./demo-k8s.sh init
Michael O'Brien
Hi, the curls are an AAI POST and GET on the cloud region - this is required as part of testing the vFW. For yourself it is optional until you need to test some use case like the vFirewall.
See the details on Running the ONAP Demos
For the aai cert - this cert is in the aai setup in your dockerdata-nfs , the json file is the body of the put - swap out your openstack tenantid
All of this is AAI specific, check the section on running AAI postman/curls in Vetted vFirewall Demo - Full draft how-to for F2F and ReadTheDocs and Tutorial: Verifying and Observing a deployed Service Instance and Verifying your ONAP Deployment and the AAI team dev page
If your init is failing then your cloud region and tenant are not set - check that you can read them in postman before running robot init (init_robot is only so you can see failures on the included web server - this should pass)
/michael
shubhra garg
Hi Michael,
Thank you so much for the instant response. Glad to notice that all the queries have been addressed. But, still I am facing some errors:
Could you please help!
BR,
Michael O'Brien
Unauthorized means either the encoded user/pass is wrong - it is AAI:AAI - or you don't have the AAI cert (old or 2018 new one)
I added a cert to this page - it is in the demo and oom repos as well - also you can get it exported from firefox.
A post from amsterdam.onap.info - the first is from the put, the rest are from robot init
buntu@ip-172-31-92-101:~$ curl -X GET https://127.0.0.1:30233/aai/v11/cloud-infrastructure/cloud-regions/ -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" -H "X-TransactionId:jimmy-postman" -H "X-FromAppId:AAI" -H "Content-Type:application/json" -H "Accept:application/json" --cacert aaiapisimpledemoopenecomporg_20171003.crt -k
{"cloud-region":[{"cloud-owner":"CloudOwner","cloud-region-id":"RegionOne","sriov-automation":false,"resource-version":"1513572496664","relationship-list":{"relationship":[{"related-to":"complex","related-link":"/aai/v11/cloud-infrastructure/complexes/complex/clli1","relationship-data":[{"relationship-key":"complex.physical-location-id","relationship-value":"clli1"}]}]}},{"cloud-owner":"CloudOwner","cloud-region-id":"IAD","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572501497"},{"cloud-owner":"CloudOwner","cloud-region-id":"HKG","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572502146"},{"cloud-owner":"CloudOwner","cloud-region-id":"DFW","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572502465"},{"cloud-owner":"CloudOwner","cloud-region-id":"ORD","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572502756"},{"cloud-owner":"CloudOwner","cloud-region-id":"SYD","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572501824"}]}
Michael O'Brien
fyi guys make sure to use aai v11 not v8 - for example
AAI-564 - Getting issue details... STATUS
shubhra garg
Hi Michael,
But, I am sure that every person, needs to fill their OWN OPENSTACK details ( rather than using the default details as mentioned in the AAI json file ).
Reason being the init robot is still failing. And if the robot testcase has to pick our openstack details via onap-parameters.yaml file ( rather than the one's specified as defaults in the json file shared ) , then definitely in AAI json file, we should pass our openstack details only. Please advise!
2. Also, I think we need to create a separate region like ( RegionThree) etc with our system openstack details , to make new entries in AAI.
2. Also, as discussed, I have checked the integration robot file used by ONAP-robot, the AAI username and password was as mentioned below:
"/dockerdata-nfs/onap/robot/eteshare/config/integration_robot_properties.py"
GLOBAL_AAI_SERVER_PROTOCOL = "https"
GLOBAL_AAI_SERVER_PORT = "8443"
GLOBAL_AAI_USERNAME = "AAI"
GLOBAL_AAI_PASSWORD = "AAI"
3. I can notice that AAI logs are not getting updated , when we are running these CURL queries that enter data into AAI. Could you please let me know how to enable AAI logs?
The last update I could notice is of 12th dec in my system for AAI logs. But, from past few days , we are constantly trying to run CURL queries to enter data into AAI.
I have logged in to the AAI-SERVICES container but no AAI logs can be seen. Screenshot attached for your reference.
4. Moreover, aai-services is not present in dockerdata-nfs folder. Not sure why? Other sub-modules are present though.
shubhra garg
Hi Michael,
Could you please let us know - how to add a new object ( cloud-owner )and a new region in AAI ?
The CURL query and the json file required to add a new object and a new region is needed.
In our steup of openstack , we have "admin" as a user/cloud-owner, we are trying to add our openstack details into AAI.
Also, we require the CURL query to add a new region , " say " RegionFour" as mentioned in the "cloud-region-id".
our openstack details:
{
"cloud-owner": "admin",
"cloud-region-id": "RegionFour",
"cloud-region-version": "2",
"cloud-type": "openstack",
"cloud-zone": "nova",
"owner-defined-type": "publicURL"
}
Original aai-cloud-region-put.json file:
cat aai-cloud-region-put.json
{
"cloud-owner": "CloudOwner",
"cloud-region-id": "RegionOne",
"cloud-region-version": "v2",
"cloud-type": "SharedNode",
"cloud-zone": "CloudZone",
"owner-defined-type": "OwnerType",
"tenants": {
"tenant": [{
"tenant-id": "{TENANT_ID}",
"tenant-name": "ecomp-dev"
}]
}
}
Best Regards,
Shubhra
Michael O'Brien
Use Kubernetes 1.8.6 for now - not the just released 1.9.0 - https://github.com/kubernetes/kubernetes/issues/57528
OOM-522 - Getting issue details... STATUS
Vaibhav Chopra
Yes, I found with K8 1.9 with Amsterdam release, Image secrets are getting failed.
Gary Wu
I set up two parallel OOM environments with docker 1.12.6, rancher 1.6.10, kubernetes 1.8.6, and helm 2.3.
On both of these, after the initial spin up, SDC would fail health checks with a 500 error even though all 5 SDC containers are running.
The SDC healthCheck API returns content as follows:
Once I restarted SDC via:
Then the SDC health check passes.
Is this a currently known issue?
Mohamed Aly ould Oumar
Hi,
This is a known issue for a long time, and they don't have a solution for
Please don't bother ur self, it won't work no matter what u do
I did installed ONAP more than 20 times, and even everything is running, it gives always the same 500 error.
They haven't fixed it and they don't admit it.
Michael O'Brien
Gary, Mohamed,
Hi, We appreciate your exercising of the system. You likely have run into a couple issues we currently have with SDC healthcheck and Kubernetes liveness in general. Please continue to raise any jiras on issues you encounter bringing up and running ONAP in general. SDC is currently the component with the least accurate healthcheck in Kubernetes or Heat.
Currently SDC passes healthcheck about 74% of the time - if we wait about 8 min after the readiness probe declares all the containers as ready 1/1. The issue with SDC (26%), SDNC(8%), APPC (1%) in general is that their exposed healthcheck urls do not always report the system up at the appropriate time.
The workaround is to delay healthcheck for now until the containers have run for a bit - 5-10 min - which is a normal warming of the system and caches in a production system.
On the CD system, SDC comes up eventually 2/3 of the time - our issue is helping OOM and the component teams adjust the healthcheck endpoints to report proper liveness (not just 200 or a subset of rest functionality) - You both are welcome to help us with these and any other of our outstanding issues - we are expanding the team.
OOM SDC healthcheck failure 26% of the time even with 3 runs and 8 min wait state
SDC-739 - Getting issue details... STATUS
The following is in progress and can also be reviewed
SDC-715 - Getting issue details... STATUS
Related SDC issue in HEAT
SDC-451 - Getting issue details... STATUS
Any assistance with the following is appreciated.
OOM-496 - Getting issue details... STATUS
thank you
Michael O'Brien
Gary Wu
In my case, the SDC never passed health checks even after waiting a couple of hours after everything is "Running" in kubectl. They passed health checks only after I restarted SDC. Which JIRA issue do you think this info is applicable to?
Rahul Sharma
Gary Wu: For me, restarting SDC helped fix the Health-check. However when launching SDC UI, it failed to open (even though Health check was now passing).
For SDC-UI to work:
.
/deleteAll
.
bash
-n onap;
).
/createAll
.
bash
-n onap
Gary Wu
For this, I had to fix /etc/hosts in vnc-portal to change the SDC IP addresses since they change once you restart SDC.
However, I think I'm going to just re-deploy the entire ONAP until SDC passes the health check since I don't know what other things become out-of-date if SDC is restarted on by itself.
Xiaobo Chen
I also met the same SDC problem after deployed ONAP. The health check still did not pass even I restart sdc(./deleteAll.bash -n onap -a sdc and ./createAll.bash -n onap -a sdc) for 10 minutes. It seems all SDC components were running up except TITAN. I checked the log in container sdc-be: /var/lib/jetty/logs/SDC/SDC-BE/error.log.3, found Tian graph failed to initialize with an execption thrown com.thinkaurelius.titan.core.TitanException. Any sugguestion about this why Tian can not work?
{
"sdcVersion": "1.1.0",
"siteMode": "unknown",
"componentsInfo": [
{
"healthCheckComponent": "BE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
},
{
"healthCheckComponent": "TITAN",
"healthCheckStatus": "DOWN",
"description": "Titan graph is down"
},
{
"healthCheckComponent": "DE",
"healthCheckStatus": "UP",
"description": "OK"
},
{
"healthCheckComponent": "CASSANDRA",
"healthCheckStatus": "UP",
"description": "OK"
},
{
"healthCheckComponent": "ON_BOARDING",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK",
"componentsInfo": [
{
"healthCheckComponent": "ZU",
"healthCheckStatus": "UP",
"version": "0.2.0",
"description": "OK"
},
{
"healthCheckComponent": "BE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
},
{
"healthCheckComponent": "CAS",
"healthCheckStatus": "UP",
"version": "2.1.17",
"description": "OK"
},
{
"healthCheckComponent": "FE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
}
]
},
{
"healthCheckComponent": "FE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
}
]
2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<** createGraph started **>
2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<** open graph with /var/lib/jetty/config/catalog-be/titan.properties started>
2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<openGraph : try to load file /var/lib/jetty/config/catalog-be/titan.properties>
2018-01-08T09:59:10.719Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool>
2018-01-08T09:59:10.726Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc>
2018-01-08T09:59:15.580Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool>
2018-01-08T09:59:15.581Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc>
2018-01-08T09:59:16.467Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: 10.42.243.240>
2018-01-08T09:59:16.468Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<RemoveHost: sdc-cs.onap-sdc>
2018-01-08T09:59:23.938Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.t.g.c.GraphDatabaseConfiguration||ActivityType=<?>, Desc=<Set default timestamp provider MICRO>
2018-01-08T09:59:23.946Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.t.g.c.GraphDatabaseConfiguration||ActivityType=<?>, Desc=<Generated unique-instance-id=0a2a0d4d395-sdc-be-1187942207-21tfw1>
2018-01-08T09:59:23.956Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool>
2018-01-08T09:59:23.956Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc>
2018-01-08T09:59:24.052Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool>
2018-01-08T09:59:24.052Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc>
2018-01-08T09:59:24.153Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: 10.42.243.240>
2018-01-08T09:59:24.153Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<RemoveHost: sdc-cs.onap-sdc>
2018-01-08T09:59:24.164Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.titan.diskstorage.Backend||ActivityType=<?>, Desc=<Initiated backend operations thread pool of size 96>
2018-01-08T09:59:34.186Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<createGraph : failed to open Titan graph with configuration file: /var/lib/jetty/config/catalog-be/titan.properties>
com.thinkaurelius.titan.core.TitanException: Could not initialize backend
at com.thinkaurelius.titan.diskstorage.Backend.initialize(Backend.java:301) ~[titan-core-1.0.0.jar:na]
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1806) ~[titan-core-1.0.0.jar:na]
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:123) ~[titan-core-1.0.0.jar:na]
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94) ~[titan-core-1.0.0.jar:na]
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:62) ~[titan-core-1.0.0.jar:na]
at org.openecomp.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:256) [catalog-dao-1.1.0.jar:na]
at org.openecomp.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:207) [catalog-dao-1.1.0.jar:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_141]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_141]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:366) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:311) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:134) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:408) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1575) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:207) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1131) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1059) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:835) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:741) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:467) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1128) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1022) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:512) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
Xiaobo Chen
I solved this problem by reinstalling SDC component for several times. To make web working, I have to change /etc/hosts in PORTAL VNC.
ramki krishnan
From what I have seen so far, health check seems to succeed immediately after containers are ready provided the worker node has enough CPU/Memory. In my case, the worker node had 48 vCPUs and 64GB RAM.
Gary Wu
What is the current status on DCAE? Any specific instructions for starting up DCAE?
Gary Wu
Also, looks like if we use DEMO_ARTIFACTS_VERSION: "1.1.1" then multiple containers fail to start?
Syed Atif Husain
I have deployed onap oom using cd.sh but cant get to portal
2 onap-portal pods are failing, logs say
portalapps" in pod "portalapps-1783099045-zkfg8" is waiting to start: trying and failing to pull image
"vnc-portal" in pod "vnc-portal-3680188324-kzt7x" is waiting to start: PodInitializing
I tried deleting and creating but it did not help. Pls advise
Rahul Sharma
Syed Atif Husain: For PortalApps, looks like your system was unable to pull the image. One way to work around is to manually pull the image and also change the pullPolicy from Always to IfNotPresent (under $OOM_HOME/kubernetes/portal/values.yaml - see here).
For vnc-portal, the Pod would stay in 'PodInitializing' until the portalapps starts up, as it's defined as init-container dependency for vnc-portal (see here).
Syed Atif Husain
Thanks Rahul Sharma I tried that, but manual pull is failing
# docker pull nexus3.onap.org:10001/onap/portal-apps:v1.3.0
v1.3.0: Pulling from onap/portal-apps
30064267e5b8: Already exists
a771fb3918f8: Already exists
e726f32f5234: Already exists
f017a45e77ce: Already exists
a0726cff2538: Already exists
0edfd34a7120: Already exists
60f8916f4ad6: Already exists
d705b1b28428: Already exists
f60cc3eb4fd3: Already exists
d3f1c4df222e: Already exists
6ae6daeaff5c: Already exists
cc77e52e0609: Already exists
a5524884a276: Extracting [==================================================>] 6.893 MB/6.893 MB
964a83c06e36: Download complete
a0292615b06b: Download complete
e8af69e9e3e4: Download complete
d7a3048354e6: Download complete
failed to register layer: open /var/lib/docker/aufs/layers/d1ce30bb68ec6a15ab6eb8d4b3593cd36a89c99f8d484dfe8653d23e298a5093: no such file or directory
Rahul Sharma
Looks like the image was pulled but extraction is having issues. Not sure what the reason is - do you have enough space on your system?
Syed Atif Husain
looks like a docker issue, I am able to pull the image on the other vm
Brian Freeman
I needed to restart the sdnc dgbuilder container after loading DGs via the mulitple_dgload.sh and k8 started a new instance before I could do a docker start. What is the mechanism to restart a container to pick up a change made on persistant storage for the container ?
Alexis de Talhouët
either through the GUI, in the onap-sdnc namespace, under pod, delete the pod. K8S will automatically restart it. Either through cli
kubectl --namespace=onap-sdnc delete pods <pod-name>
Make sure to delete the pod, not the deployment.
Brian Freeman
Is that the same as a docker stop , docker start ? delete seems like it would be more like a docker rm ?
Alexis de Talhouët
It's exactly a docker rm. With K8S you never stop start a container, you rm and re-create it (this is done automatically by K8S when a pod is deleted). So if the changed data is persisted, then it's ok to delete the pod, hence delete the container, because the new one will pick up the new data.
K8S deployment manifest defines the contract for the pod, which in the end is the container. Deleting the pod does delete the container, and kubernetes, based on the deployment manifest, will re-create it. Hope it clarifies things.
Brian Freeman
It does clarify things but we will have to make sure the things we did in Docker like edit a file inside the container and do a stop/start or restart can be done in K8. This is actually a problem in debugging where the project teams will have to make changes to support debugging in K8. We had setup shared data in the container configuration so that we can edit values and then delete the pod to pick up the new values. This will be a tedious pain.
Alexis de Talhouët
At the end of the day, a docker stop docker start is just a lazy way to restart process(es) running within the container. If the proccess(es) to restart are not tied to the docker liveliness (e.g PID 1), then instead of stopping and starting the container, we could simply stop and start the process within the container. I'm not too scared about this being a pain to debug, but we will see
I doubt I'm familliar enough with all of them (knowing they are around 80 containers as of today for the whole ONAP).
Brian Freeman
I think we need to add a volume link (-v in docker) for each app that we might need to modify configuration and do a restart - dgbuilder for instance has a script to bulk load DG's into the flows.json file but this file would be lost whenever the dgbuilder/node-red pod is restarted right now. This would not happen in regular docker on a stop/start or restart.
Brian Freeman
We need take a running instance of ONAP using OOM and change each application in some normal way and then restart to confirm that on a restart we aren't losing data. This is something we did in the HEAT/Docker/DockerCompose environment to make sure all the persistant storage settings were correct. Since k8 does a recreate instead of a restart we may lose file based configuration data. I would look a : add vFW netconf mount to APPC, add a flow to DG builder, create and distribute a model, instantiate a vFW , execute a closed loop policy on the vFW and vDNS ; then restart all containers and confirm that the data created is still there and the same control loops still run. I suspect right now with an OOM installation that parts might not survive a docker stop and K8 re-create of the container (since we cant do a docker start)
Andrew Fenner
Hi,
I'm new to Kubernates and to OOM but so the following question could have a obvious answer that I've completely missed.
Is there a reason not to use the following commands to expose the K8s containers so that you don't have to log on via the VNC sever which is just a pain.
kubectl expose services portalapps --type=LoadBalancer --port 8989 --target-port=8080 --name=frontend -n onap-portal
kubectl expose services sdc-fe --type=LoadBalancer --port 8181 --target-port=8181 --name=frontend -n onap-sdc
kubectl expose services vid-server --type=LoadBalancer --port 8080 --target-port=8080 --name=frontend -n onap-vid
This exposed the portal, Vid and SDC so then the K8S services could be used directly. Then the ip address to use can just be found using
kubectl get services --all-namespaces=true | grep -i frontend or you can assign the IP address using --external-ip=w.x.y.z
Then I just updated the hosts file as "normal"
Thanks
/Andrew
Michael O'Brien
Good question, I guess we live with port mapping requiring the vnc-portal so we can run multiple environments on the same host each with 30xxx, 31xxx etc.. but in reality most of us by default run one set of ONAP containers. Myself when I work in postman I use the 30xxx ports except for using the SDC gui - in the vnc-portal.
I think we need a JIRA to run ONAP in affective single port mapping config where 8989 for example maps to 8989 outside the namespace and not 30211 - for ease of development.
OOM-562 - Getting issue details... STATUS
Brian Freeman
How would I add
/opt/onap/sdnc/dgbuilder/releases
as a directory that is mapped from the host file system so that updates to the flows.json file in /opt/onap/sdnc/dgbuilder/releases/sndc1.0/flows/flows.json would persist across restarts/recreates of the container ?
alternatively is there a way to temporarily set the restart policy to never so that we can manually update flows.json and then restart the existing container ?
Alexis de Talhouët
Brian,
To do so, update the sdnc dgbuilder deployment file, to add the following
This mean you will mount the volume identified by the name to the specified mountPath
The name here has the be the same as the one specified above, it serves as ID to correlated the mounted folder.
The hostpath implies here that you have created on the host the folder /dockerdata-nfs/{{ .Values.nsPrefix }}/sdnc/dgbuilder/releases (where {{ .Values.nsPrefix }} is onap) and put the data you whish to persit in there.
With those addition, here is how the sdnc dgbuilder deployment would look like
Brian Freeman
I made the changes to: /opt/onap/oom/kubernetes/sdnc/templates/dgbuilder-deployment.yaml
I created the release directory: /dockerdata-nfs/onap/sdnc/dgbuilder/releases
I stopped the current container but the restarted container didn't seem to write to the dockerdata-nfs directory ?
Do I need to redeploy the dgbuilder via rancher or kubectl somehow ?
Brian Freeman
kubectl -n onap-sdnc edit deployment/sdnc-dgbuilder
caused a redeployment but dgbuilder didn't like the hostPath since files it was expecting aren't on the host until the dgbuilder image is pulled. Not sure if its a permissions problem on the host directories.
Should we be using something more like EmptyDir{} (but that doesn't seem to take a path) ?
Alexis de Talhouët
Brian, I forget to mentioned the data has to be put in the persisted directory in the host first. Mounting the host directory will overwrite the directory in the container. So the first time, all the data is in the persisted directory (in the host). Then you start the pod, the persisted data will be mounted in the container. From there, you can either edit the persisted data from the server or from the pod itself.
Brian Freeman
OK that worked.
Michael O'Brien
Brian,
Hi again, Very Good idea. A lot of the applications need a way to either expose config (log, db config) into the container or push data out (logs) to a NFS mapped share on the host. My current in-progress understanding of Kubernetes is that it wraps docker very closely and adds on top of docker where appropriate. Many of the docker commands exec, log, cp are the same as we have seen. For static persistent volumes there are already some defined in the yamls using volumeMounts: and volumes:. We also have dynamic volumes (specific to the undercloud VIM) in the SDNC clustering poc - https://gerrit.onap.org/r/#/c/25467/23. We still need places where volume mounts can be done to the same directory that already has an emptyDir stream into Filebeat (which has a volume under the covers) - see
For example the following has a patch that exposes a dir into the container just like a docker volume or a volume in docker-compose - the issue here is mixing emptyDir (exposing dirs between containers) and exposing dirs outside to the FS/NFS
https://jira.onap.org/browse/LOG-52
This is only one way to do a static PV in K8S
https://jira.onap.org/secure/attachment/10436/LOG-50-expose_mso_logs.patch
I have used these existing volumes that expose the logback.xml file for example to move files into a container like the MSO app server in kubernetes from /dockerdata-nfs instead of using kubectl cp.
I myself will also look into PV's to replace the mounts in the ELK stack for the CD job - that is being migrated from docker-compose to Kubernetes and for the logging RI containers.
Going through this documentation now - to get more familiar with the different PV options - https://kubernetes.io/docs/concepts/storage/persistent-volumes/
For the question about whether we can hold off on container restarts to be able to manually update a json exposed into the container. The model of Kubernetes auto-scaling is stateless. When I push pods without affinity rules - the containers randomly get assigned to any host and bringing down a container either manually or because of a health initiated trigger is usually out of the control of any OSS outside of Kubernetes - but there are callbacks. Rancher and Kubeadm for example are northbound to Kubernetes and act as VIM's and in the same way that a spot VM going down in EC2 gives a 2 min warning - I would expect we could register as listener to to at least a pre-stop of a container - even though it is a second or 2. I would also like to verify this and document all of this on our K8S devops page - all good questions that we need definitely need an answer for.
/michael
Brian Freeman
I had to modify cd.sh to change the parameters to deleteAll.sh.
#oom/kubernetes/oneclick/deleteAll.bash -n onap -y yes
oom/kubernetes/oneclick/deleteAll.bash -n onap
I was getting an error message since "-y" wasnt an allowed argument. Is cd.sh checked into onap.gerrit.org somewhere so we can reference that instead of the copy on the wiki ? Maybe I'm just looking in the wrong spot.
Michael O'Brien
Brian, hi, you are using amsterdam - the change done by Munir has not been ported from master.
I retrofitted the CD script to fix the jenkins job and patched github to align with the new default prompt behaviour of deleteAll
yes, ideally all the scripts northbound of deleteAll should be in onap - I will move the cd.sh script into a ci/cd folder in OOM or in demo - as it clones oom inside.
Also, I'll put in an if statement on the delete special to amsterdam to not require the -y option
OOM-528 - Getting issue details... STATUS
Michael O'Brien
Actually I think this will be an issue for anyone master/amsterdam that has cloned before OOM-528 - essentially we need a migration plan
In my case I brought up an older image of master before the change - and the cd.sh script with the -y option fails (because it is not resilient
) on -y
Therefore unfortunately anyone on an older branch either needs to do a git pull or edit cd.sh one-time to remove the -y - after that you are ok and effectively upgraded to OOM-528 - Getting issue details... STATUS
I will add a migration line to the last onap-discuss on this
https://lists.onap.org/pipermail/onap-discuss/2018-January/007198.html
hope this helps
thank you
/michael
Nagaraja Upadyaya
Good Morning,
I am new to ONAP and yesterday I did setup ONAP on a permanent AWS m4 large instance which uses Dynamic public IP. Today, I removed existing ONAP environment and recreated new environment in Rancher. After adding the environment when I am trying to add host, rancher is not detecting new public IP. In the register command rancher is still referring to yesterday's public IP which is not valid.
Please let me know the steps required to restart ONAP on a Dynamic IP based server which needs to be shutdown and restarted on daily basis.
Thank you in advance
Best Regards,
Nagaraj
Nagaraja Upadyaya
I was able to restart ONAP after restart of a Dynamic IP based server by doing following :
Before Shutdown :
a) Remove Host and ONAP environment from Rancher.
b) Remove .kube/config file before shutting down the server.
After Reboot :
c) Perform the steps required for registering the server with new IP on Rancher i.e.,by adding ONAP environment and host with new IP in Rancher,
d) Register server in Rancher by executing the command for registration provided in Add Host Page.
e) Generate Config in CLI page of ONAP Environment in Rancher and copy the content to .kube/config file on server.
f) Run command "cd.sh -b amsterdam", to drop and recreate namespace, containers and pods in K8s.
Please let me know if above approach is correct or is there any better way of starting ONAP on restart of a server with Dynamic IP.
Best Regards,
Nagaraj
Michael O'Brien
Nagaraja,
Hi, that is a common issue with Rancher - it needs a static IP or DNS name.
You have a couple workarounds, elastic IP, elastic IP + domain name, edit the host registration URL in rancher, or docker stop/rm rancher and rerun it
I opt for elastic IP + DNS entry - in my case I register onap.info in Route53, create an EIP in the EC2 console, then associate the EIP with the labelled instance ID network ID before bringing up rancher/kubernetes/helm.
This will also allow you to save the AMI and bring it up later with a 20 min delay until it is fully functional - provided you keep the EIP and domain A record.
this how the CD system works - see the following but do not touch anything it is used for deployment testing for the first 57 min of the hour. http://amsterdam.onap.info:8880/
ONAP on Kubernetes on Amazon EC2#AllocateanEIPstaticpublicIP(one-time)
ONAP on Kubernetes on Amazon EC2#CreateaRoute53RecordSet-TypeA(one-time)
ONAP on Kubernetes on Amazon EC2#AssociateEIPwithEC2Instance
But I recommend otherwise edit 8880/admin/settings and enter the new host registration URL/IP/DNS-name
let me know
/michael
Michael O'Brien
Sorry I was answering your first question from memory this morning - didn't realize you added a 2nd comment with your workaround - yes that is OK but we agree - a lot of work. What you may do - and I will try is a very small static IP for the host a 4G machine that does not run the ONAP pods - they will all have affinity to a 2nd 64G host that has a dynamic IP - but the server must be static.
Another workaround that I have not tried is a automated host authentication via REST or CLI - this I need to research.
But still the easier way is to bring up the EC2 VM with an EIP (it will cost $2 per month when not used though) - You should have an allocation of 5 on your AWS account - I asked for 10.
/michael
Nagaraja Upadyaya
Thank you Michael
I will try the EC2 VM with an EIP option.
Best Regards,
Nagaraj
Hong Guan
Hi Michael ,
We ran prepull_docker.sh on 4 different k8s nodes at the same time, we got 75,78,80 and 81 images (docker images | wc -l), we verified the pulling process using (ps -ef | grep docker | grep pull), all pulling processes were completed. Do you know why we got different number images?
Thanks,
Hong
Michael O'Brien
Hong,
Yes, weird - intermittent errors usually mean the underlying cloud provider, I sometimes get pull errors and even timeouts - used to get them on heat as well. There are issues with nexus3 servers periodically due to load, upgrades and I have heard about a serious regional issue with mirrors. I do not know the cloud provider that these servers run on - the issue may be there. The script is pretty simple - it greps all the values.yaml files for docker names and images - there were issues where it parsed incorrectly and tried to pull just the image name or just the image version - but these were fixed - hopefully no more issues with the sh script.
There also may be issues with docker itself with 80 parallel pulls - we likely should add a -serial flag - to pull in sequence - it would be less performant.
you can do the following on a clean system to see the parallel pulls in progress and/or count them
ps -ef | grep docker | grep pull | wc -l
In the end there should be no issues because anything not pulled in the prepull will just get pulled when the docker containers are run via kubectl - they will just start slower the first time.
please note that there are a couple "huge" images on the order of 1-2G one of them for SDNC - and i have lately seen issues bringing up SDNC on a clean system - required a ./deleteAll.bash -n onap -a sdnc and re ./createAll.
Another possibility is that docker is optimizing or rearranging the pulls and running into issues depending on the order.
Another issue is that the 4 different servers have different image sets - as the docker images | wc -l may be picking up server or client images only present on one or more of the nodes - if you look at a cluster of 4 servers - I have one - then the master has a lot more images than the other 4 and the other 3 clients usually run different combinations of the 6 kubernetes servers - for what reason I am still looking at - before you even bring up the onap containers.
lets watch this - there is enough writing here to raise a JIRA - which I will likely do.
thank you for your dilligence
/michael
James Forsyth
Michael O'Brien - I am trying to bring up vid, robot, and aai w/ the latest oom, seeing this error on several aai pods:
Error: failed to start container "filebeat-onap-aai-resources": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/log/filebeat/logback/filebeat.yml\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/2234aef661aa61185f7fb8fd694ec59d29f82c2478d9de1beee0a282e4af4936\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/2234aef661aa61185f7fb8fd694ec59d29f82c2478d9de1beee0a282e4af4936/usr/share/filebeat/filebeat.yml\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
The config job seems to have failed with an error but it did create the files under /dockerdata-nfs/onap
onap config 0/1 Error 0 33m
however is this supposed to be a dir?
root@z800-kube:/dockerdata-nfs/onap/log/filebeat/logback# ls -l
total 4
drwxr-xr-x 2 root root 4096 Jan 11 17:25 filebeat.yml
Michael O'Brien
Jimmy,
Hi, good question and thank you for all your help with OOM code/config/reviews.
That particular error "not a directory" is a sort of red herring - it means 2 things, the container is not finished initializing (the PVs and volume mounts are not ready yet - it will go away after the pod tree is stable - or your config pod had an issue - not recoverable without a delete/purge. These errors occur on all pods for a while until the hierarchy of dependent pods are up and each one goes through the init cycle - however if you see these after the normal 7-15 min startup time and they do not pass config - then you likely have an issue with the config pod pushing all the /dockerdata-nfs files (this is being removed and refactored as we speak) - due to missing config in setenv.bash and onap-parameters.yaml (it must be copied to oom/kubernetes/config)
Also that many failures usually means a config pod issue - or a full HD or RAM issue (if you have over 80G HD (you need 100G over time) and you have over 51G ram - then it is a config pod issue.
How to avoid this. See the cd.sh script attached and linked to at the top of the page - this is used to provision a system automatically on the CD servers we run the hourly jenkins job on - the script can also be used by developers wishing a full refresh of their environment (delete, re-pull, config up, pods up, run healthcheck...)
https://github.com/obrienlabs/onap-root/blob/master/cd.sh
AutomatedInstallation
If you are running the system manually - use the cd.sh script or the manual instructions at the top in detail - the usual config issue is forgetting to configure onap-parameters.yaml (you will know this by checking the config pod status). The second usual issue is failing to run setenv.sh to pickup the docker and other env variables - this will also fail the config container.
kubectl get pods --all-namespaces -a
it must say
onap config 0/1 Completed 0 1m
do the following to see any errors - usually a missing $variable set
kubectl -namespace onap logs -f config
as of an hour ago these were the failing components - no AAI, vid or robot
As an additional reference you can refer to the running master CD job - for the times when you might think it is actually failing - not just locally.
http://jenkins.onap.info/job/oom-cd/1109/console
Also AAI has not been failing healthcheck for at least the last 7 days - actually I think since the first week of Dec 2017 - once - it is one of the most stable ONAP components
http://kibana.onap.info:5601
Let me know if this fixes your issues - if your config pod is busted - then you will need to deleteAll pods, purge the config pod and rerun setenv, config pod and createAll - see the script for the exact details
If not we can triage further
thank you
/michael
James Forsyth
Thanks Michael O'Brien, I needed to refresh the config pod and once i got "completed" I was able to get aai and several others going! Thanks for your help!
Andrew Fenner
This is a pretty basic question. I've been having some trouble with getting SDNC running (still troubleshooting) but as then looking at the readiness docker image and understanding how it worked.
I think I understood most of it but I couldn't figure out how the value of "K8S_CONFIG_B64" environment variable was been set as the seems to be some "magic" for this and I was hoping somebody could give me a hint.
Thanks
/Andrew
Michael O'Brien
Andrew, hi, just to cover off SDNC - since clustering was put in - the images have increased in number and size - there may be a timeout issue. So on a completely clean VM you may need to delete and create -a sdnc to get around this issue that only appears on slow machines (those with less than 16 cores)
if that was your issue - otherwise we need a jira
Alain Drolet
Last December (2017) I managed to deploy an almost-amsterdam version of ONAP using oom on a single Ubuntu VM.
I used a manual list of commands (cd.sh was not available at the time) as explained on this page.
The installation used:
Docker 1.12,
Rancher server 1.6.10,
Kubernetes 1.8.6,
Helm 2.3.0
Most container came up. Over time (weeks) things degraded.
Back from the holidays I tried to reinstall (this time I'm aiming for the amsterdam branch) from scratch and had issue with Rancher.
To remove the possibility that my host was corrupted in some way,
today I used a brand new Ubuntu 16.04.4 VM I tried to create the same environment for ONAP.
I executed the commands in
oom_rancher_setup_1.sh
.I executed these by hand so that I can better control the docker installation and the usermod command.
I ended up with the same problem I had on my old VM, yesterday.
The problem is has follow:
In the Rancher Environment GUI I created a Kubernetes environment.
Once I made it the default the State became "Unhealthy".
Rancher won't tell you why!
Then I tried anyway to add a host.
When running the command:
The agent started to complain that it could not connect to the server.
SSL certification is failing.
I get an output like this:
The Unhealthy state might be due to the web client having the same communication issue.
This does not appear to be an ONAP specific issue, since I'm failing in one of the first installation step
which is to get a Rancher server and agent working together.
This behavior was only observed upon my return on January 9th.
In December I had no such issue.
Could a certificate be expired?
Where are these certificates? (In the docker images I suspect)
Am I the only one with this error?
Any help will be appreciated.
Thank you
Alain
Michael O'Brien
Alain,
Hi, welcome. Also very detailed and complete environment description - appreciated.
I am extremely busy still - but your post stood out. I will return in more detail on the weekend.
For now, yes I also have had issues connecting the client - usually this involved a non static IP. for example if I saved an AMI on AWS and got a different EIP. There are several fixes for that one - use a static EIP and/or assign a domain name to it. Also you can retrofit your server - I turned off security on the CD poc for a couple days
http://amsterdam.onap.info:8880/admin/settings
change the following (not this server! just an example) in the "something else" textbox
I would hope this would work - but only if your 10.182.40.40 was changed from the original IP
Host Registration URL
What base URL should hosts use to connect to the Rancher API?
http://amsterdam.onap.info:8880
Don't include
/v1
or any other path, but if you are doing SSL termination in front of Rancher, be sure to usehttps://
.Alain Drolet
Thank you for looking into this.
My host is a plain VMWare VM, with a fixed IP. Nothing fancy.
I'm currently doing deeper debging of the SSL connection. I found that the rancher-agent fails in its run.sh script
on the line with
For what I understand (not confirmed) at this point is that the VM should have data provided by the rancher server at:
/var/run/rancher
. There should be various sub-dirs, some with cert data.In the past I saw some files there but on my new host
/var/run/rancher
is empty!I think this is where server and sagent share cert data (anyone knows this?).
I'll keep the community posted If I find something interesting.
Michael O'Brien
Ok good, I am running a VMware 14 Workstation VM on windows at home and Fusion VMs on my macs - will look into it there as well.
/michael
Alain Drolet
Update:
I reproduced the same SSL issues using a small vagrant VM (2 CPU, 2GB).
The VagrantFile uses:
config.vm.box = "ubuntu/xenial64"
From this VM I ran the following commands:
I also tried rancher server v1.6.11. Same issues were seen.
Alain Drolet
Found it!
Of course it was a trivial mistake (that costed me a lot).
Unless you chose to use HTTPS and go through a lot of custom ssl configurations (as documented on the Rancher site) you should not use HTTPS.
Looking again at the examples on this page only HTTP is used.
I guess Chrome added HTTPS by default and send me on this madness chase!
When connecting to the rancher server using a browser, MAKE SURE to use a HTTP URL.
E.g. :
http://<your k8s server host>:8880
Then
When adding a host for the first time, you will be presented with a page asking to confirm the "Host Registration URL".
This should be the same as the URL you used in your browser.
In any case make sure it is HTTP, NOT HTTPS.
The command you will get to add the host in step 5 should be of the form:
Since the agent is instructed to connect to the server using http, you should be fine.
Moral of the story, beware of browser trying to help you too much!
Now I can have a nice weekend,
and move on to figuring real ONAP issue!
:-)
Michael O'Brien
Alain,
Nice, good heads up on the http vs https host registration issue to watch for
thank you
/michael
Pavan Gupta
Hi Alain,
Could you get through the issue? I have also manually installed the components, but unable to get ONAP up running. It would be helpful if you can list the steps taken to install and run onap.
Michael O'Brien
Pavan,
Hi, welcome.
Mostly automated undercloud, helm, kubernetes, oom - AutomatedInstallation
Manual procedures QuickstartInstallation
/michael
Alain Drolet
Hi Pavan
I could post my notes.
They would look like a summary of information already on this page.
If some think it would be useful, I could do so.
In order to avoid too much redundancy on this page, could you tell us a bit more about where you have issues.
Then maybe I could post a subset of my notes around this area.
Basically I see this installation being made of 2 major steps:
After this step you should be able to go to the Rancher Web UI and see the rancher/kubernetes dockers instances and pod running.
This means running the oom_rancher_setup_1.sh, which in my case I ran manually.
Followed by some interaction in Rancher's web UI to create a k8s env, and add a host.
What do you see running or not?
Radhika Kaslikar
Hi All,
The SLI-API module for SDNC is missing from the below link, which health check makes use of.
Link to check the SLI-API : <hostIP>:<port of sdnc>/apidoc/explorer/index.html
The SLI-API module for APPC is present at the below mentioned location and the health check for it is passed.
Link to check the SLI-API : <hostIP>:<port of APPC>/apidoc/explorer/index.html
username : admin
password for both SDNC/APPC : Kp8bJ4SXszM0WXlhak3eHlcse2gAw84vaoGGmJvUy2U
The below is the snippet for SDNC and APPC health check report.
Kindly let us know how to resolve this issue.
How to make SLI-API available for SDNC, as the health check is failing for the same.
Snippet for the SLI-API missing from SDNC API doc page:
Snippet for the SLI-API PRESENT from APPC API doc page:
Michael O'Brien
Radhika, good triage - I would raise a JIRA with SDNC for this https://jira.onap.org/secure/RapidBoard.jspa?rapidView=39&view=planning
/michael
shubhra garg
Hi Michael,
Thank you. We both are the same team.
You can find more supporting debugging for the same SDNC SLI-API in the attached document.
After running the installSdncDb.sh script , and after logging into the SDNC container and after logging into the SDNC database, we found that the "VLAN_ID_POOL" table does not exists, though the database was showing that the mentioned table exists. It was present in stale format.
<opt/sdnc/features/sdnc-sli# cat /opt/onap/sdnc/bin/startODL.sh
${SDNC_HOME}/bin/installSdncDb.sh
Table "VLAN_ID_POOL" present in the sdnctl database:
But, upon describing the table, it shows error.
Solution : We removed the SDNC stale tables from database location and restarted the SDNC pod, it resolved the above mentioned error.
Best Regards,
Shubhra
Syed Atif Husain
Is there a wiki or video for openstack setup needed for onap oom to openstack connectivity?
I am struggling connecting oom vm to openstack vm and setting correct values in onap_parameters.yaml
~atif
Rahul Sharma
Hey Syed Atif Husain,
I followed this page and it helped sort out the issues. See the comments section for details on the onap parameters.
Syed Atif Husain
Rahul Sharma I followed the steps on the link above but I am facing issues related to connectivity to Openstack. I guess I am missing some basic setup in my openstack.
I have created a network and subnet on openstack. I am using there ids in the param file for OPENSTACK_OAM_NETWORK_ID and OPENSTACK_OAM_SUBNET_ID respectively. What should I use for OPENSTACK_PUBLIC_NET_ID? Do I have to create another network? How do I ensure my ONAP VM is able to connect to the Openstack VM? [I have installed ONAP OOM on one Azure VM and Openstack on another VM].
Any pointers to these are highly appreciated.
Rahul Sharma
Syed Atif Husain: OPENSTACK_PUBLIC_NET_ID should be one of the networks on your Openstack that's publicly accessible. One of the public IP assigned to your vFW_x_VNF (x = SINC or PG) would belong to this network.
You don't need to create other networks: unprotected_private_net_id (zdfw1fwl01_unprotected), unprotected_private_subnet_id(zdfw1fwl01_unprotected_sub), protected_private_net_id(zdfw1fwl01_protected), protected_private_subnet_id(zdfw1fwl01_protected_sub) would be created as part of vFW_SINC stack deployment.
The "pub_key" attribute will be used to communicate with the VM on Openstack.
Note: the values sent in the SDNC-Preload step are used to create the stack; so if you want to update something, you can do it then.
Also, when I tested, my ONAP was running on Openstack; running ONAP on Azure should be similar considering that MultiVIM should take care of different platforms underneath but you can verify in that area. Have a look at the VF instantiation flow for Release 1.1 here
Syed Atif Husain
Rahul Sharma Hi I tried the alternate steps on "ONAP on Kubernetes on Rancher in OpenStack" but I am getting an issue in step 4 'Create the Kubernetes host on OpenStack'
When I execute the curl command, the host appears in kub but it says 'waiting for ssh to be available' and it fails after 60 retries.
I have opened all ports and I am able to ssh to the openstack VM manually.
Pls advise
Rahul Sharma
Can you check if the 'K8S_FLAVOR, PRIVATE_NETWORK_NAME' etc exists on your Openstack? What is the output of the Curl command.
It's also advisable to post the query on the confluence page where you are facing the issue; that way it would help others.
Syed Atif Husain
I have posted my reply on that page, values of variables are correct
Pavan Gupta
Hello,
When I run cd.sh, the config pod isnt coming up. It's shown to be in error state. Does anyone know why this happens? In the kubectl logs, I see the following error 'DEPLOY_DCAE" must be set in onap-parameters.yaml.
Syed Atif Husain
Hey Pavan Gupta
You need to give dcae related params in onap-paramters.yaml file. Otherwise remove dcae component from HELM_APPS in oom/kubernetes/oneclick/setenv.bash if you dont want to install dcae or if your openstack setup is not ready
Refer manual instructions under the section 'quickstart installation'
Pavan Gupta
Hi Syed,
I am using VMware ESXi host to bring up Ubuntu VM. will it work?
Pavan
Michael O'Brien
Pavan,
Just to be sure checked that the yaml was not changed recently
https://git.onap.org/oom/tree/kubernetes/config/onap-parameters-sample.yaml
I won't have time until later today to check - but if the config container complains about a missing DCAE variable - then there is a chance the config yaml is missing it
Did you also source
https://git.onap.org/oom/tree/kubernetes/oneclick/setenv.bash
However currently the CD job is OK with the latest master (are you on Amsterdam by chance?)
http://jenkins.onap.info/job/oom-cd/1388/console
I also installed a clean machine yesterday with no issues - verify your onap-parameters.yaml file against the sample.
these work - just replace your keystone config for your openstack instance for VNFs
/michael
-----Original Message-----
From: Michael O'Brien
Sent: Tuesday, January 23, 2018 07:04
To: 'Pavan Gupta' <pavan.gupta@calsoftinc.com>
Subject: RE: Issues with cd.sh sciprt
Pavan,
Hi, the script mirrors the manual instructions and runs ok on several servers including the automated CD server.
You place the 2 aai files, the onap-configuration.yaml file beside the cd.sh script and run it (this assumes you have run the rancher config ok)
I would need the error conditions pasted to determine if you missed a step - likely during the config pod bootstrap - could you post the errors on the config pod you see.
Also verify all versions and prerequisites, Rancher 1.6.10, helm 2.3.x, docker 1.12.x, Kubernetes 1.8.x
Try to come to the OOM meeting and/or raise a JIRA and we can look at it from there.
DCAE is in flux but there should be no issues with the 2.0.0 tag for the config container
/michael
-----Original Message-----
From: Pavan Gupta [mailto:pavan.gupta@calsoftinc.com]
Sent: Tuesday, January 23, 2018 06:09
To: Michael O'Brien <Frank.Obrien@amdocs.com>
Subject: Issues with cd.sh sciprt
Hi Michael,
I have posted this query on the wiki page as well. I could get the installation script working and moved onto running cd.sh. Config pod is shown in error state. I looked at Kubenetes log and it says DEPLOY_DCAE should be set in snap-parameters.yaml file. I tried setting this parameter, but the error still continues. Any idea, what’s going wrong or needs to be done to resolve this issue?
Pavan
Pavan Gupta
Michael,
My onap-parameters.yaml has been taken from https://git.onap.org/oom/tree/kubernetes/config/onap-parameters-sample.yaml. I am doing the setup on VMware ESXi host. Just wondering how will Openstack parameters will be used in this case. Has anyone setup ONAP on VMware ESXi host?
Pavan
Alain Drolet
Which branch/version are you trying to install?
In my case I'm focussing on `amsterdam`.
For this you need to pick the sample file from the amsterdam branch.
Last time I checked the amsterdam branch and the master branch version were very different.
I used this command to fetch the sample file (and save it under the correct name):
Michael O'Brien
I have setup onap via OOM via Rancher on VMware Workstation 14 and VMware Fusion 8 with no issues
The config in onap-parameters.yaml must point to an openstack user/pass/tenant so that you can create a customer/tenant/region in AAI as part of the vFW use case. You can use any openstack or Rackspace config - you only need keystone to work until you get to SO instantiation.
In the future we will be able to configure Azure or AWS credentials via work being done in the Multicloud repo.
/michael
Andrew Fenner
Success!!!!
Hi, I got to the point of getting a VNF deployed using the kubernates deployment so just wanted to let you know it can work in different environments.
I'm using Rancher and a host VM on a private Red Hat OpenStack.
a couple of local workarounds but and I had to redeploy AAI as it didn't come up first time.
However SDNC didn't work and I had to change it from using the NFS server to using the kubernates volumes as I was getting the error in the nfs-provisioner-.... pod refering to all the ports but I think I have them all open etc.
Why is volume handling for SDNC different to the other namespaces ?
/Andrew
Rahul Sharma
Andrew Fenner, Hi,
Volume handling for SDNC is done differently for 2 reasons:
Not sure why nfs-provisioner isn't starting for you when you have the ports open?
shubhra garg
Hi Michael,
List:
Vendor name : MyVendor
License agreement : MyLicenseAgreement
Entitlementpool : MyEntitlementPool
Service : vFW-vSINK-service
VSP : vFW-vSINK
2. After running the init robot testcase, we can notice that only the default services are being listed. The service , which we created using SDC, is not visible in AAI.
3. The curl queries for SDC are not working. We tried many curl queries for the same, to fetch the service name/instance.
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/sdc2/rest/v1/consumers/
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/sdc/v1/catalog/services
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/sdc/v1/catalog/services
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/
https://{serverRoot}/sdc/v1/catalog/{assetType}?{filterKey}={filterValue}
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/sdc/v1/catalog/services/vFW-vSINK-service
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" https://127.0.0.1:30205/sdc/v1/registerForDistribution -u"cs0008:demo123456!"
Any help would be appreciated!
Best Regards,
Shubhra
Pavan Gupta
Can we install ONAP on Ubuntu 16.04 VM. Onap-parameters.yaml has 14.04 version mentioned. Will that make any difference in installation?
Pavan
Alain Drolet
Hi Pavan
I did my deployment on Ubuntu 16.04.4 with no issue related to the host OS version.
Michael O'Brien
Pavan, Hi, that ubuntu 14 version is a left over from the original heat parameters - it was used to spin up VM's (the original 1.0 heat install had a mix of 14/16 VMs - don't know why we don't also list the 16 version - you can ignore it as we are only using docker containers in Kubernetes right now.
The only reason we are targeting 16.04 is it is the recommended version of our Kubernetes manager RI (Rancher for now) - you can also use Kubeadm - http://rancher.com/docs/rancher/v1.6/en/installing-rancher/installing-server/#single-container
/michael
Michael O'Brien
Heads up that we can now use Helm 2.6+ - verified 2.7.2, working on 2.8.0 - so that tpl templating can be used 20180124:0800EST master branch.
Openstack, Rackspace, AWS EC2 (pending Azure VM, GCE VM)
current validated config is Rancher 1.6.10+, Helm 2.7.2, Kubernetes 1.8.6, Docker 1.12
In progress - Rancher 1.6.14, Helm 2.8.0, Kubernetes 1.8.6, Docker 17.03.2 - OK on Rackspace and AWS EC2/EBS
Neet to verify 1.9.0
OOM-486 - Getting issue details... STATUS
Pavan Gupta
After the installation, I tried http://10.22.4.112:30211 on the browser and the ONAP portal didn't open up. Not all services are shown 1/1 (please check the output below)
kubectl get pods --all-namespaces
onap-vfc vfc-ztevnfmdriver-726786078-jc7b4 0/1 ImagePullBackOff 0 12h
onap-aaf aaf-1993711932-h3q31 0/1 Running 0 12h
I am not sure, why can't I see the onap portal now.
FOllowing is the error msg on Kubernetes. Its not able to pull the container image.
Failed to pull image "nexus3.onap.org:10001/onap/vfc/ztevnfmdriver:v1.0.2": rpc error: code = 2 desc = Error: image onap/vfc/ztevnfmdriver:v1.0.2 not found
Error syncing pod
shubhra garg
Pavan Gupta
Check for oom/kubernetes/portal/values.yaml file in the respective ONAP component ( sayvfc or portal or MSO etc ) and look for the prepull policy option.
Set it to Always.
Then do a docker pull for the respective image.
Best Regards,
Shubhra
Marcus Williams
I'm seeing kube2msb pod failing to come up when deploying oom using './cd.sh -b amsterdam' :
onap-kube2msb kube2msb-registrator-1382931887-565pz 0/1 CrashLoopBackOff 8 17m
kube2msb logs:
1/25/2018 11:06:59 AM2018-01-25 19:06:59.777976 I | Using https://kubernetes.default.svc.cluster.local:443 for kubernetes master
1/25/2018 11:06:59 AM2018-01-25 19:06:59.805097 I | Could not connect to Kube Masterthe server has asked for the client to provide credentials
Has anyone seen this issue or know how to solve it?
Rancher v1.6.10
kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.7-rancher1", GitCommit:"a1ea37c6f6d21f315a07631b17b9537881e1986a", GitTreeState:"clean", BuildDate:"2017-10-02T21:33:08Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
helm version
Client: &version.Version{SemVer:"v2.3.0", GitCommit:"d83c245fc324117885ed83afc90ac74afed271b4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.3.0", GitCommit:"d83c245fc324117885ed83afc90ac74afed271b4", GitTreeState:"clean"}
docker 1.12.6
Alexis de Talhouët
I guess this would be on Amsterdam. You need to update the kube2msb deployment file with your K8S token. In Rancher, under your environment, go in Kubernetes → CLI → Generate Config this should gives you your token to authenticate to K8S API for your deployment.
Marcus Williams
Thanks Alexis - I tried exactly what you suggested but it still wasn't working (thus the above post).
It is now working. I did two things and I'm not sure which fixed the issue:
shubhra garg
Hi Marcus Williams
In the below threads, check my detailed response for CrashLoopBackOff error state.
It worked for us and has resolved the issue.
In short, Put back the backup of the dockerdata-nfs folder and then do a cleaned delete of ONAP pods.
Then delete the dockerdaat-nfs folder and bring fresh ONAP pods.
check below response.
ravi rao
After completing the ./createAll.bash -n onap I see every pods up and running except for
onap-portal vnc-portal-845d84676c-jcdmp 0/1 CrashLoopBackOff 17 1h
Logs Indicate that x11vnc exited:
stored passwd in file: /.password2
/usr/lib/python2.7/dist-packages/supervisor/options.py:297: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2018-01-25 21:47:52,310 CRIT Supervisor running as root (no user in config file)
2018-01-25 21:47:52,310 WARN Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing
2018-01-25 21:47:52,354 INFO RPC interface 'supervisor' initialized
2018-01-25 21:47:52,357 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-01-25 21:47:52,357 INFO supervisord started with pid 44
2018-01-25 21:47:53,361 INFO spawned: 'xvfb' with pid 51
2018-01-25 21:47:53,363 INFO spawned: 'pcmanfm' with pid 52
2018-01-25 21:47:53,365 INFO spawned: 'lxpanel' with pid 53
2018-01-25 21:47:53,368 INFO spawned: 'lxsession' with pid 54
2018-01-25 21:47:53,371 INFO spawned: 'x11vnc' with pid 55
2018-01-25 21:47:53,373 INFO spawned: 'novnc' with pid 56
2018-01-25 21:47:53,406 INFO exited: x11vnc (exit status 1; not expected)
2018-01-25 21:47:54,681 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:54,681 INFO success: pcmanfm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:54,681 INFO success: lxpanel entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:54,681 INFO success: lxsession entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:54,683 INFO spawned: 'x11vnc' with pid 68
2018-01-25 21:47:54,683 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:56,638 INFO success: x11vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Has anyone seen this problem ??
Regards,
Ravi
Winnie Tsang (IBM)
Hi Ravi,
I encounter the same problem too. Did you find a solution or workaround for this issue yet?
Best Regards,
Winnie
shubhra garg
Hi Winnie Tsang (IBM) ravi rao
The ONAP system/pods enter into the CrashLoopBackOff state, only when you delete the dockerdata-nfs for the respective ONAP component.
rm
rf /dockerdatanfs/portal has been deleted. Now, ONAP system has noways of knowing - which data to delete, so there are uncleaned/dangling links.Solution :
For the vnc-portal, I have faced the similar issue today:
kubectl describe po/<container-for-vnc-portal>
n onapportalNote : the respective image will be missing.
docker pull <image name>
Kubernetes will pick the newly pulled docker image. The issue for vnc-portal will be resolved.
Best Regards,
Shubhra
ravi rao
Hi Shubhra,
Thanks for the detailed steps. I did pull all the docker images that portal app is depends on and I still see the same error
onap-portal vnc-portal-56c8b774fb-wvv2d 0/1 PostStartHookError 4 1m
Main issue is, with this error I cannot get to the portal-vnc and hence cannot access the portal UI. Any help is greatly appreciated..
Regards,
Ravi
Michael O'Brien
Guys, it helps if you post your versions (onap branch, helm version, kubernetes version, rancher version, docker version), whether your config container ran ok 0/1 completed and that you have all dependent containers up (for example vnc-portal needs vid to start)
common issue is helm related (helm 2.5+ running on amsterdam - stick to 2.3 on that branch)
for example only master works with helm 2.5+
OOM-441 - Getting issue details... STATUS
ravi rao
Hi Michael,
Below are details in my env..
ubuntu@onap-rancher-vm:~/oom/kubernetes/oneclick$ helm version
Client: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitTreeState:"clean"}\
When you say helm 2.5+ are you referring to server version or client ? I only installed helm client v2.1.3 and I think rancher installs the helm server.
onap I am using is amsterdam
All the pods are up and running except for vnc-portal container in onap-portal namespace and elasticsearch container in onap-log
ubuntu@onap-rancher-vm:~/oom/kubernetes/oneclick$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-76b8cd7b5-zjk6h 1/1 Running 0 2d
kube-system kube-dns-5d7b4487c9-6srpr 3/3 Running 0 2d
kube-system kubernetes-dashboard-f9577fffd-sjmst 1/1 Running 0 2d
kube-system monitoring-grafana-997796fcf-k5hfs 1/1 Running 0 2d
kube-system monitoring-influxdb-56fdcd96b-sm5vt 1/1 Running 0 2d
kube-system tiller-deploy-cc96d4f6b-gjm6p 1/1 Running 0 2d
onap-aaf aaf-849d477595-rxhfk 0/1 Running 0 1d
onap-aaf aaf-cs-6f989ff9cb-g9xrg 1/1 Running 0 1d
onap-aai aai-resources-64cc9b6757-wjq7v 2/2 Running 0 1d
onap-aai aai-service-8cd946dbf-mxt9l 1/1 Running 5 1d
onap-aai aai-traversal-984d55b6d-75dst 2/2 Running 0 1d
onap-aai data-router-df8bffd44-lfnv8 1/1 Running 0 1d
onap-aai elasticsearch-6b577bf757-rpdqn 1/1 Running 0 1d
onap-aai hbase-794b5b644d-gdsh9 1/1 Running 0 1d
onap-aai model-loader-service-6684c846db-g9hsl 2/2 Running 0 1d
onap-aai search-data-service-77bdb5f849-hjn56 2/2 Running 0 1d
onap-aai sparky-be-69d5667b5f-k6tck 2/2 Running 0 1d
onap-appc appc-86cc48f4c4-q8xgw 2/2 Running 0 1d
onap-appc appc-dbhost-7bd58565d9-fqrvs 1/1 Running 0 1d
onap-appc appc-dgbuilder-78746d5b75-t8988 1/1 Running 0 1d
onap-clamp clamp-5fdf8b7d5f-2mckp 1/1 Running 0 1d
onap-clamp clamp-mariadb-64dd848468-snmmh 1/1 Running 0 1d
onap-cli cli-6885486887-hcvgj 1/1 Running 0 1d
onap-consul consul-agent-5c744c8758-8spjs 1/1 Running 1 1d
onap-consul consul-server-687f6f6556-cz78t 1/1 Running 2 1d
onap-consul consul-server-687f6f6556-vl7lj 1/1 Running 2 1d
onap-consul consul-server-687f6f6556-xb8kt 1/1 Running 1 1d
onap-dcaegen2 heat-bootstrap-6b8db64547-gzcnd 1/1 Running 0 1d
onap-dcaegen2 nginx-7ddc7ffc78-lvt7s 1/1 Running 0 1d
onap-esr esr-esrgui-68cdbd94f5-x26vg 1/1 Running 0 1d
onap-esr esr-esrserver-7fd9c6b6fc-8dwnd 1/1 Running 0 1d
onap-kube2msb kube2msb-registrator-8668c8f5b9-qd795 1/1 Running 0 1d
onap-log elasticsearch-6df4f65775-9b45s 0/1 CrashLoopBackOff 539 1d
onap-log kibana-846489d66d-98fz8 1/1 Running 0 1d
onap-log logstash-68f8d87968-9xc5c 1/1 Running 0 1d
onap-message-router dmaap-59f79b8b6-kx9kj 1/1 Running 1 1d
onap-message-router global-kafka-7bd76d957b-bpf7l 1/1 Running 1 1d
onap-message-router zookeeper-7df6479654-psf7b 1/1 Running 0 1d
onap-msb msb-consul-6c79b86c79-9krm9 1/1 Running 0 1d
onap-msb msb-discovery-845db56dc5-zq849 1/1 Running 0 1d
onap-msb msb-eag-65bd96b98-vbtrx 1/1 Running 0 1d
onap-msb msb-iag-7bb5b74cd9-5bx4m 1/1 Running 0 1d
onap-mso mariadb-5879646dd5-mb98c 1/1 Running 0 1d
onap-mso mso-7bfc5cf78c-28llb 2/2 Running 0 1d
onap-multicloud framework-6877c6f4d-xv6rm 1/1 Running 0 1d
onap-multicloud multicloud-ocata-5c955bcc96-6qjhz 1/1 Running 0 1d
onap-multicloud multicloud-vio-5bccd9fdd7-qcjzq 1/1 Running 0 1d
onap-multicloud multicloud-windriver-5d9bd7ff5-n7grp 1/1 Running 0 1d
onap-policy brmsgw-dc766bd4f-9mrgf 1/1 Running 0 1d
onap-policy drools-59d8499d7d-jck5l 2/2 Running 0 1d
onap-policy mariadb-56ffbf5bcf-hf9f5 1/1 Running 0 1d
onap-policy nexus-c89ccd7fc-n4g9j 1/1 Running 0 1d
onap-policy pap-586bd544d7-gxtdj 2/2 Running 0 1d
onap-policy pdp-78b8cbf8b4-fh2hf 2/2 Running 0 1d
onap-portal portalapps-7c488c4c84-8x4t9 2/2 Running 0 1h
onap-portal portaldb-7f8547d599-hwcp7 1/1 Running 0 1h
onap-portal portalwidgets-799dfd79f6-5q85k 1/1 Running 0 1h
onap-portal vnc-portal-56c8b774fb-dl46s 0/1 CrashLoopBackOff 20 1h
onap-robot robot-959b68c94-7n9kh 1/1 Running 0 1d
onap-sdc sdc-be-6bf4f5d744-xk5l6 2/2 Running 0 1d
onap-sdc sdc-cs-6bfc44d4fc-s5nnz 1/1 Running 0 1d
onap-sdc sdc-es-69f77b4778-th98q 1/1 Running 0 1d
onap-sdc sdc-fe-84646b4bff-fczlr 2/2 Running 0 1d
onap-sdc sdc-kb-5468f987d9-5wklh 1/1 Running 0 1d
onap-sdnc dmaap-listener-5956b4c8dc-9c4wm 1/1 Running 0 1d
onap-sdnc sdnc-968d56bcc-6q24c 2/2 Running 0 1d
onap-sdnc sdnc-dbhost-7446545c76-lkhj6 1/1 Running 0 1d
onap-sdnc sdnc-dgbuilder-55696ffff8-6mtqh 1/1 Running 0 1d
onap-sdnc sdnc-portal-6dbcd7c948-tqtj9 1/1 Running 0 1d
onap-sdnc ueb-listener-66dc757b5-f4r6m 1/1 Running 0 1d
onap-uui uui-578cd988b6-m7v72 1/1 Running 0 1d
onap-uui uui-server-576998685c-sb6kk 1/1 Running 0 1d
onap-vfc vfc-catalog-6ff7b74b68-6j4q8 1/1 Running 0 1d
onap-vfc vfc-emsdriver-7845c8f9f-w2vgf 1/1 Running 0 1d
onap-vfc vfc-gvnfmdriver-56cf469b46-wsg4r 1/1 Running 0 1d
onap-vfc vfc-hwvnfmdriver-588d5b679f-zpcj6 1/1 Running 0 1d
onap-vfc vfc-jujudriver-6db77bfdd5-qz4fk 1/1 Running 0 1d
onap-vfc vfc-nokiavnfmdriver-6c78675f8d-4k5mx 1/1 Running 0 1d
onap-vfc vfc-nslcm-796b678d-nvvfd 1/1 Running 0 1d
onap-vfc vfc-resmgr-74f858b688-shkzw 1/1 Running 0 1d
onap-vfc vfc-vnflcm-5849759444-fcrft 1/1 Running 0 1d
onap-vfc vfc-vnfmgr-77df547c78-lwp97 1/1 Running 0 1d
onap-vfc vfc-vnfres-5bddd7fc68-s6spr 1/1 Running 0 1d
onap-vfc vfc-workflow-5849854569-sd249 1/1 Running 0 1d
onap-vfc vfc-workflowengineactiviti-699f669db9-s99n8 1/1 Running 0 1d
onap-vfc vfc-ztesdncdriver-5dcf694c4-fsdf2 1/1 Running 0 1d
onap-vfc vfc-ztevmanagerdriver-6c8d776f5c-68spg 1/1 Running 0 1d
onap-vid vid-mariadb-575fd8f48-x95t6 1/1 Running 0 1d
onap-vid vid-server-6cdf654d86-x72lc 2/2 Running 0 1d
onap-vnfsdk postgres-5679d856cf-gz5d4 1/1 Running 0 1d
onap-vnfsdk refrepo-7d9665bd47-cv6h5 1/1 Running 0 1d
ravi rao
onap branch - Amsterdam
helm versions
Client: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitTreeState:"clean"}\
Kubernetes version
ubuntu@onap-rancher-vm:~/oom/kubernetes/oneclick$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.5-rancher1", GitCommit:"6cb179822b9f77893eac5612c91a0ed7c0941b45", GitTreeState:"clean", BuildDate:"2017-12-11T17:40:37Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
rancher version - 1.6.14
Docker version on rancher VM - 17.03.0-ce
Docker version on Kubernetes VM - 1.12.6
Regards,
Ravi
Michael O'Brien
Hi your versions are mixed across your rancher/kubernetes server - try collocating first until your system is up.
see the top of the page
You need helm 2.6.1 in order to run the tpl templates in the yamls in master
You need to run an older version of helm 2.3.1 in amsterdam so that vnc-portal will startup
Release
Kubernetes
Helm
Kubectl
Docker
Kumar Lakshman Kumar
Hi Ravi,
Did you got vnc-portal container in onap-portal namespace and elasticsearch container in onap-log working.
I resolved my elasticsearch container issue by increasing sudo sysctl -w vm.max_map_count=262144 on pod its running.
but still no luck with vnc-portal in log I see the x11vnc process keep getting restarted not sure how to fix this issue.
ravi rao
Hi Kumar,
I followed the instructions specified in the below post by kranthi to solve the problem.
NOTE: Main reason for this issue is I did not have the recommended versions of helm/rancher & kubernetes. It was not so easy to align the versions so tried the below suggested fix and it worked for me. You can also try it and see if it solves your issue.
Regards,
Ravi..
kranthi guttikonda
I had the same problem with Amsterdam branch. Master branch has fixes to resolve this. Basically the helm chart they defined lifecycle PostStart which may run before starting container itself (Its not guaranteed). So, please take the portal folder from master branch and replace in Amsterdam or just replace resources folder in side portal (from master) and also portal-vnc-dep.yaml file inside template from master to Amsterdam
helm delete --purge onap-portal
cd oom/kubernetes
helm install --name onap-portal ./portal
Michael O'Brien
elasticsearch memory issue - fixed
OOM-511 - Getting issue details... STATUS
Portal issue (helm related) - fixed
OOM-486 - Getting issue details... STATUS
OOM-441 - Getting issue details... STATUS
Guys, follow or use as a reference the scripts below - it will create a rancher environment and install onap on either amsterdam or master (use your own onap-parameters.yaml)
entrypoint
OOM-710 - Getting issue details... STATUS
rancher install
OOM-715 - Getting issue details... STATUS
onap install
OOM-716 - Getting issue details... STATUS
shubhra garg
See the below response.
Santosh Thapa Magar
Hi Michael O'Brien,
Sorry for the trouble.
I am a beginner to ONAP.
I wanted to install ONAP on AWS environment.
But as I went through your video I found I need onap_paramaters.yaml file which includes the Openstack credentials.
Do I need this for installing ONAP on AWS environment.
I want to install Onap on AWS instance only.
Is it optional or I must have Openstack Credentials
Please help.
Best Regards,
Santosh
Michael O'Brien
Santosh,
Hi, no, you can put fake user/pass/token strings there for now. When you get the point of running the use cases - like the vFW and need to create a customer/tenent/region on AAI - this is where real credentials will be required to authenticate to Keystone. Later when you orchestrate VNFs via SO - full functionality will be required.
For now use the sample one in the repo.
let us know how things work out. And don't hesitate to ask questions about AWS in your case when bringing up the system.
thank you /michael
Santosh Thapa Magar
Hi Michael O'Brien
Thank you very much for the reply.
I will use the sample you provided.
Before I start installing Onap, can you please help me understand the need of domain name for installation.
Can't I use Elastic IP only?
And about the use case, Can you let me know which use cases will work under this Installation of ONAP on Kubernetes with out having Openstack Credentials.
Thanks a lot.
Best Regards,
Santosh
Michael O'Brien
You can use a routable IP or a domain - I use a domain so I don't have to remember the IP's of my servers
Santosh Thapa Magar
Hi Michael O'Brien
Sorry to bother you.
I started to install ONAP as per you guidance in AWS, I was able to install rancher docker and helm.
But when I hit cd.sh i get the following errors.
Can you have a look at it and suggest me the solution.
PS: I have use the same onap-paramater.yaml provided in this repo.
********************************************************************************************************
root@ip-10-0-1-113:~# ./cd.sh -b release-1.1.0
Wed Jan 31 06:53:31 UTC 2018
provide onap-parameters.yaml and aai-cloud-region-put.json
vm.max_map_count = 262144
remove existing oom
./cd.sh: line 20: oom/kubernetes/oneclick/setenv.bash: No such file or directory
./cd.sh: line 22: oom/kubernetes/oneclick/deleteAll.bash: No such file or directory
Error: incompatible versions client[v2.8.0] server[v2.6.1]
sleeping 1 min
deleting /dockerdata-nfs
chmod: cannot access '/dockerdata-nfs/onap': No such file or directory
pull new oom
Cloning into 'oom'...
fatal: Remote branch release-1.1.0 not found in upstream origin
start config pod
./cd.sh: line 43: oom/kubernetes/oneclick/setenv.bash: No such file or directory
moving onap-parameters.yaml to oom/kubernetes/config
cp: cannot create regular file 'oom/kubernetes/config': No such file or directory
./cd.sh: line 47: cd: oom/kubernetes/config: No such file or directory
./cd.sh: line 48: ./createConfig.sh: No such file or directory
verify onap-config is 0/1 not 1/1 - as in completed - an error pod - means you are missing onap-parameters.yaml or values are not set in it.
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete....
************************************************************************************************
Michael O'Brien
Santosh,
H, there are 2 errors above
fatal: Remote branch release-1.1.0 not found in upstream origin
release-1.1.0 was deleted a month ago - yes I had a comment in my cd.sh script as an example for master or that release - I will update the comment to print "amsterdam" - so there is no confusion
for reference here are the active branches
https://gerrit.onap.org/r/#/admin/projects/oom,branches
the rest of the errors are because the git clone did not work - no files
./cd.sh: line 47: cd: oom/kubernetes/config: No such file or directory
./cd.sh: line 48: ./createConfig.sh: No such file or directory
do the following and you will be ok
./cd.sh -b master
or
./cd.sh -b amsterdam
/michael
Pavan Gupta
Hello,
Any help is appreciated. If required, we can do a remote desktop session.
Michael O'Brien
Pavan,
Check your cd script output.rtf - you are not running the correct helm version (likely you are running 2.3 - should be running 2.6+ - ideally 2.8.0)
For the vnf image pull - have not looked at this - verify the right tag is being pulled from nexus3 and close off the JIRA if you find it.
If you look at your logs - you will see you have the right # of non-running containers (2) but you will notice that some of your createAll calls are failing on the new template tpl code added last week (yes the author of that change should have notified the community of the pending change - I picked up the comm task later that day).
like the following
Error: parse error in "appc/templates/appc-conf-configmap.yaml": template: appc/templates/appc-conf-configmap.yaml:8: function "tpl" not defined
The command helm returned with error code 1
Check this page for the right version - it changed on Wed.
I sent out this notice as well to the onap-discuss newsgroup
https://lists.onap.org/pipermail/onap-discuss/2018-January/007674.html
for
OOM-552 - Getting issue details... STATUS
https://gerrit.onap.org/r/#/c/28291/
thank you
/michael
Andrew Fenner
Hi,
I got a closed loop UC running with OOM deployment. I used the workaround for DCAE/VES as outlined in " DCAE mS Deployment (Standalone instantiation) ".
I've attached the helm files I made for this workaround if you just expand them into ..../oom/kubernates you should get a directory called ves and then you can just go ../oneclick/createall.sh -n onap -a ves
/Andrew
ves-oom.tar
Bharath Thiruveedula
Hi Andrew Fenner, it's nice to see that it works for you. I have OOM setup with out DCAE. Now I can download the ves-oom.tar and create the pod? How can I make other components point to this standalone DCAE model? we have to change vFWCL.zip to give DCAE collector ip and port right? Can you give more details on Closed Loop end?
Andrew Fenner
The file is attached in the last post. The VES and CDAP are intergrated into the rest of the other components by the k8s dns. The way to expose the VES port is using
kubectl expose services ves-vesserver --type=LoadBalancer --port 8080 --target-port=8080 --name=vesfrontend -n onap-ves
I should work out how to add this to the helm templates.
/Andrew
Bharath Thiruveedula
Sure Andrew. And one more question, are you creating "/ves/DmaapConfig.json" file. I couldn't find it in the tar. Am I missing something here?
Andrew Fenner
Sorry. I missed explaining that step.
I created a file /dockerdata-nfs/onap/ves/DmaapConfig.json
with the content below.
This overrides the default and means you can update the location of the dmaap host. Whats below should work if you have the default namespace names
{
"channels": [
{
"name": "sec_measurement",
"cambria.topic": "unauthenticated.SEC_MEASUREMENT_OUTPUT",
"class": "HpCambriaOutputStream",
"stripHpId": "true",
"type": "out",
"cambria.hosts": "dmaap.onap-message-router"
},
{
"name": "sec_fault",
"cambria.topic": "unauthenticated.SEC_FAULT_OUTPUT",
"class": "HpCambriaOutputStream",
"stripHpId": "true",
"type": "out",
"cambria.hosts": "dmaap.onap-message-router"
}
]
}
shubhra garg
Location in onap, where we need to upload/untar the ves-oom.tar file?
Andrew Fenner
Hi,
I didn't use the vFWCL.zip as I got a different type of closed loop running for an internal VNF.
The files go in
.../oom/kubernates
i.e. along side the files for all the other namespaces.
You still have to load the TCA application into the CDAP server in much the same way as in the referenced workaround page.
/Andrew
shubhra garg
When we were doing SDNC preload operation, for SINK and PG, we noticed for the modified json files for SINK ( our values of VNF details and service instance etc), the existing/predefined VFWCL instance got changed? Was it correct?
shubhra garg
Hi All,
We are facing the error (Init:ErrImageNeverPull )for all the ONAP components. Can anybody help - how to rectify the error?
Michael O'Brien
Shubra,
Image pull errors usually mean you cannot reach nexus3.onap.org - especially that many - which could be your proxy (switch to a cell connection to verify).
Do a manual docker pull to check this.
Another reason could be you did not source setenv.bach where the docker repo credentials/url are set
Verify that these are set
shubhra garg
Thank you so much Michael, source the setenv.bash file, resolved the issue for most of the ONAP component.
But, for some of the components like vnc-portal, elasticsearch , pap-policy etc were still showing the same error.
Doing a manual pull of image resolved the issue, but for elasticsearch the issue still persists.
For elasticsearch, the below is the system state, I do have the docker image for it in the system but still it is having the error - ImageNeverPull.
Though, for policy-pe , policy-drools , I have pulled the latest docker images manually.
But, for brmsgw ( onap-policy ), which image to pull, I have no idea ??
Can you suggest something.
onap-policy brmsgw-2679573537-z6pp8 0/1 Init:0/1 36 6h
onap-policy drools-1375106353-1q1ch 0/2 Init:0/1 36 6h
Also, do I need to run the command "docker run image-name" , after pulling the images ? Where does the latest pulled images go?
I have pulled in the image for vnc-portal. But , now the system is NOT showing docker image for the same. What went wrong?
I did a docker pull for the below images but it is not listing in docker images.
Conclusion:
We have resolved the above mentioned issue by pulling in the docker images , which were missing from the system for the respective components.
Michael O'Brien
Shubhra,
Remember this is Kubernetes not Docker. Kubernetes is a layer on top of Docker - you don't need to run any docker commands except when installing the Rancher wrapper on Kubernetes - after that always use kubectl
Follow the instructions on this wiki "exactly" or use the scripts for your first time install
Pulling docker images yourself is not required - the only reason for the prepull is to speed up the onap startup - for example running the createAll a second time will run faster since the images were pulled earlier.
The images that the values.yaml (s) files pull are the ones pulled automatically by Kubernetes - you don't need later versions unless there are app fixes we have not switched to yet.
If you are having issues with docker pulls then it is in your system behind your firewall - I can't remember if it was you (I answer a lot of support questions here) - did you do a proper source of setenv.sh and also make sure your config pod is OK.
If you really want to see ONAP work usually OK - just to verify your procedure - run it on a VM in public cloud like AWS or Azure and apply that to your local environment. I am thinking that there may be an issue pulling from nexus3 - I have seen this in other corp environments.
/michael
Hamzeh Khalili
Hi All,
I follow instruction above to run ONAP on Kubernetes, where the server and client are co-located.
I have two issues regarding the implementation:
demo-aai model-loader-service-5c9d84589b-6pz5q 2/2 Running 0 4h
demo-aai search-data-service-6fc58fd7cc-qhzdc 2/2 Running 0 4h
demo-aai sparky-be-684d6759bc-jl5wx 2/2 Running 0 4h
demo-mso mso-6c4dd64bf9-nhdjs 2/2 Running 2 4h
demo-sdnc sdnc-dbhost-0 2/2 Running 1 4h
demo-vid vid-server-56d895b8c-2nctp 2/2 Running 0 3h
2. in the next step, i just followed the VNC-portal through the Video but the pod portal is not available there too. In principle, i tried to add the portal but an error is comes up that "the portal is already exist". in addition i looking for the ete-k8s.sh file in the dockerdata-nfs but there is no any files except eteshare and robot!
Can any one help me to fix these two issues?
Rahul Sharma
Hamzeh Khalili, Hi,
For 1. Yes,policy and portal should come in the above 'kubectl' result. I would recommend checking your setenv.bash under $HOME/oom/kubernetes/oneclick and check which HELM_APPS you are deploying. Make sure it has policy and portal in there.
For 2. ete-k8s.sh is present under $HOME/oom/kubernetes/robot, not under dockerdata-nfs. eteshare under dockerdata-nfs/onap/robot would contain the logs of the run when you execute ete-k8s.sh.
Hamzeh Khalili
Hi Rahul Sharma
Regarding to first issue: Policy and Portal are there.
Regarding to the second issue: i just followed instruction of the VNC-portal. The video shows that ete-k8s.sh must appear in the dockerdata-nfs when running ./createAll.bash -n demo
because of the portal, i can not check AAI endpoints and run health check!
Any idea?
Rahul Sharma
Hamzeh Khalili:
Hamzeh Khalili
Rahul Sharma
I think mistakenly i have created to instances. One based on instruction provided in ONAP on Kubernetes (onap) and the second one based on vnc-portal instruction (demo). Should i delete one of the instances, for example demo? if yes please tell me what command i should use!
if i delete one instance, Does it effect on the other one?
when i ran kubectl get pods -n onap-portal for onap i receive following messages:
root@omap:~/oom/kubernetes/robot# kubectl get pods -n onap-portal
NAME READY STATUS RESTARTS AGE
portalapps-dd4f99c9b-lbm7w 0/2 Init:Error 0 24m
portaldb-7f8547d599-f2wlv 0/1 CrashLoopBackOff 5 24m
portalwidgets-6f884fd4b4-wl84p 0/1 Init:Error 0 24m
vnc-portal-687cdf7845-clqth 0/1 Init:0/4 1 24m
But for demo is:
root@omap:~/oom/kubernetes/robot# kubectl get pods -n demo-portal
No resources found.
in other case, when i run the health check (as you mentioned), i receive the following message:
root@omap:~/oom/kubernetes/robot# ./ete-k8s.sh health
No resources found.
error: expected 'exec POD_NAME COMMAND [ARG1] [ARG2] ... [ARGN]'.
POD_NAME and COMMAND are required arguments for the exec command
See 'kubectl exec -h' for help and examples.
Thanks for your kind!
Rahul Sharma
I am not sure about the demo-portal. But yes, if the ports are already being used, there would be conflicts when launching similar pod again.
I would recommend clearing up and starting afresh.
Here is what I would do:
Hamzeh Khalili
Rahul Sharma
I trying to delete the containers but i faced the following error!
Should i first evacuate host in theRancher or leave it as it is?
root@omap:~/oom/kubernetes/oneclick# ./deleteAll.bash -n demo -y
********** Cleaning up ONAP:
release "demo-consul" deleted
namespace "demo-consul" deleted
clusterrolebinding "demo-consul-admin-binding" deleted
Service account demo-consul-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-msb" deleted
clusterrolebinding "demo-msb-admin-binding" deleted
Service account demo-msb-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-mso" deleted
clusterrolebinding "demo-mso-admin-binding" deleted
Service account demo-mso-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-message-router" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-message-router-admin-binding" not found
Service account demo-message-router-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-sdnc" deleted
clusterrolebinding "demo-sdnc-admin-binding" deleted
Service account demo-sdnc-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-vid" deleted
clusterrolebinding "demo-vid-admin-binding" deleted
Service account demo-vid-admin-binding deleted.
release "demo-robot" deleted
namespace "demo-robot" deleted
clusterrolebinding "demo-robot-admin-binding" deleted
Service account demo-robot-admin-binding deleted.
E0201 09:24:42.090532 5895 portforward.go:331] an error occurred forwarding 32898 -> 44134: error forwarding port 44134 to pod 9b031662eac045462b5e018cc6829467a799568021c3a97dfe8d7ec6272e1064, uid : exit status 1: 2018/02/01 09:24:42 socat[7805] E connect(6, AF=2 127.0.0.1:44134, 16): Connection refused
Error: transport is closing
namespace "demo-portal" deleted
clusterrolebinding "demo-portal-admin-binding" deleted
Service account demo-portal-admin-binding deleted.
Error: release: "demo-policy" not found
namespace "demo-policy" deleted
clusterrolebinding "demo-policy-admin-binding" deleted
Service account demo-policy-admin-binding deleted.
Error: release: "demo-appc" not found
Error from server (NotFound): namespaces "demo-appc" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-appc-admin-binding" not found
Service account demo-appc-admin-binding deleted.
release "demo-aai" deleted
namespace "demo-aai" deleted
clusterrolebinding "demo-aai-admin-binding" deleted
Service account demo-aai-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-sdc" deleted
clusterrolebinding "demo-sdc-admin-binding" deleted
Service account demo-sdc-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-dcaegen2" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-dcaegen2-admin-binding" not found
Service account demo-dcaegen2-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-log" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-log-admin-binding" not found
Service account demo-log-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-cli" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-cli-admin-binding" not found
Service account demo-cli-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-multicloud" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-multicloud-admin-binding" not found
Service account demo-multicloud-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-clamp" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-clamp-admin-binding" not found
Service account demo-clamp-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-vnfsdk" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-vnfsdk-admin-binding" not found
Service account demo-vnfsdk-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-uui" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-uui-admin-binding" not found
Service account demo-uui-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-aaf" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-aaf-admin-binding" not found
Service account demo-aaf-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-vfc" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-vfc-admin-binding" not found
Service account demo-vfc-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-kube2msb" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-kube2msb-admin-binding" not found
Service account demo-kube2msb-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-esr" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-esr-admin-binding" not found
Service account demo-esr-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo" deleted
Waiting for namespaces termination...
Apart of that i try to delete the demo and onap but i am not succeed again.
Here is the error for the second command (./deleteAll.bash -n onap):
root@omap:~/oom/kubernetes/oneclick# ./deleteAll.bash -n demo
Current kubectl context does not match context specified: ONAP
You are about to delete deployment from: ONAP
To continue enter context name: demo
Your response does not match current context! Skipping delete ...
root@omap:~/oom/kubernetes/oneclick#
Michael O'Brien
Some of the earlier errors are normal - I have seen these on half-deployed systems
if the following shows pods still up (except the 6 for kubernetes) even after a helm delete --purge - then you could also start from scratch - delete all of your kubernetes and rancher docker containers
kubectl get pods --all-namespaces -a
Docker DevOps#Dockercleanup
also try to follow the tutorial here "exactly" if this is your first time running onap - or use the included scripts - you won't have any issues that way.
Also just to be safe - because there may be some hardcoding of "onap" - it was hardcoded in places under helm 2.3 because we could not use the tpl template until 2.6 (we only upgraded to 2.8 last week)
Hamzeh Khalili
Michael O'Brien
I am totally new on ONAP. I exactly followed as the tutorial, but once i tried to add vnc-portal, the errors are come up. Because in instruction of the vnc-portal mentioned that need to create a demo for the portal which make a conflict with the onap (it seems that running two instances are complicated!)
As you suggested i deleted the Pods, but one of them still is in terminating state, should i ignore that or i should start from scratch?
root@omap:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
demo-sdnc sdnc-dbhost-0 0/2 Terminating 1 2d
kube-system heapster-76b8cd7b5-z99xr 1/1 Running 0 3d
kube-system kube-dns-5d7b4487c9-zc5tx 3/3 Running 735 3d
kube-system kubernetes-dashboard-f9577fffd-c8bgs 1/1 Running 0 3d
kube-system monitoring-grafana-997796fcf-mgqd9 1/1 Running 0 3d
kube-system monitoring-influxdb-56fdcd96b-pnbrj 1/1 Running 0 3d
kube-system tiller-deploy-74f6f6c747-7cvth 1/1 Running 373 3d
Michael O'Brien
Eveything is normal except for the failed SDNC container deletion - I have seen this on another system - 2 days ago - something went into master for SDNC that caused this - for that particular machine deleted the VM and raised a new spot VM - a helm delete --purge had no effect - even killing the docker outside of kubernetes had no effect - I had notes on this and will raise a JIRA - the next system I raised for the CD jobs dis not have the issue anymore.
OOM-653 - Getting issue details... STATUS