Hi Michael O'Brien Does this include DCAE as well? I think this is the best way to install ONAP. Does this include any config files as well to talk to openstack cloud to instantiate VNFs?
I am planning to install ONAP but couldn't decide to use which way of the setup. Using Full ONAP setup on VMs or Kubernetes based setup with containers. Are both solutions will be developed in the future or development will continue with one of them ?
I see the recently added update about not being able to pull images because of missing credentials. I encountered this yesterday and was able to get a workaround done by creating the secret and embedding the imagePullSecrets to the *-deployment.yaml file.
In our current environment (namespace 1:1 → service 1:1 → pod 1:1 → docker container) it looks like the following single command will have a global scope (no need to modify individual yaml files - a slight alternative to what you have suggested which would work as well.
So no code changes which is good. Currently everything seems to be coming up - but my 70G VM is at 99% so we need more HD space.
Edit: actually even though it looked to work
2017-06-30T19:31 UTC 2017-06-30T19:31 UTC pulling image "nexus3.onap.org:10001/openecomp/sdc-elasticsearch:1.0-STAGING-latest" kubelet 172.17.4.99 spec.containers{sdc-es} 2 2017-06-30T19:31 UTC 2017-06-30T19:31 UTC
still getting errors without the namespace for each service like in your example - if we wait long enough
So a better fix Yves and I are testing is to put the line just after the namespace creation in createAll.bash
I'm surprised that it appears to work for you, as it doesn't for my environment. First, you should have to specify the imagePullSecrets for it to work... that can either be done in the yaml or by using the patch serviceaccount command. Second, the scope of the secret for imagePullSecrets is just that namespace:
Pods can only reference image pull secrets in their own namespace, so this process needs to be done one time per namespace.
In your environment, had you previously pulled the images before? I noticed in my environment that it would find a previously pulled image even if I didn't have the authentication credentials. To test that out, I had to add " imagePullPolicy: Always " to the *-deployment.yaml file under the container scope, so it would always try to pull it.
So I think a fix is necessary. I can submit a suggested change to the createAll.bash script that creates the secret and updates the service account in each namespace?
We previously saw a successful pull from nexus3 - but that turned out to be a leftover mod in my branch yaml for a specific pod.
Yes, I should know in about 10 min (in the middle of a redeploy) if I need to patch - makes sense because it would assume a magical 1:1 association - what if I created several secrets.
I'll adjust and retest.
btw, thanks for working with us getting Kubernetes/oom up!
My test of the updated create_namespace() method eliminated all of the "no credentials" errors. I have plenty of other errors (most seem to be related to the readiness check timing out), but I think this one is licked.
Is there a better way to track this than the comments here? Jira?
Actually our mso images loaded fine after internal retries - bringing up the whole system (except dcae) - so this is without a secret override on the yamls that target nexus3.
It includes your patch line from above
My vagrant vm ran out of HD space at 19G - resizing
wont work on the coreos image - moving up one level of virtualization (docker on virtualbox on vmware-rhel73 in win10) to (docker on virtualbox on win10)
vid still failing on FS
Failed to start container with docker id 47b63e352857 with error: Error response from daemon: {"message":"oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"rootfs_linux.go:54: mounting \\\\\\\"/dockerdata-nfs/onapdemo/vid/vid/lf_config/vid-my.cnf\\\\\\\" to rootfs \\\\\\\"/var/lib/docker/overlay2/0638a5d171ddacf7346133ee5e53104992243e897370bb054383f2e121e5d63f/merged\\\\\\\" at \\\\\\\"/var/lib/docker/overlay2/0638a5d171ddacf7346133ee5e53104992243e897370bb054383f2e121e5d63f/merged/etc/mysql/my.cnf\\\\\\\" caused \\\\\\\"not a directory\\\\\\\"\\\"\"\n: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type"}
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-aai.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal Error syncing pod
vid-mariadb-1108617343-zgnbd onap-vid Waiting: rpc error: code = 2 desc = failed to start container "c4966c8f8dbfdf460ca661afa94adc7f536fd4b33ed3af7a0857ecdeefed1225": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/vid/vid/lf_config/vid-my.cnf\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36/etc/mysql/my.cnf\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-vid.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal Error: failed to start container "vid-mariadb": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/vid/vid/lf_config/vid-my.cnf\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36/etc/mysql/my.cnf\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""} Error syncing pod
Hi, OOM-3 has been deprecated (it is in the closed state) - the secrets fix is implemented differently now - you don't need the workaround.
Also the search line limits is a bug in rancher that you can ignore - it is warning that more than 5 dns search terms were used - not an issue - see my other comments on this page
The only real issue is "Error syncing pod" this is an intermittent timing issue (most likely) that we are working on - a faster/more-cores system should see less of this.
If you only have 2 working pods - you might not have run the config-init pod - verify you have /dockerdata-nfs on you host FS.
but yes, again Many of them are stuck with the same error :- "Error Syncing POD"
and yes now the Server I am using is having 128GB Ram. (Though I have configured proxy in best known manner, but do you think this also can relates to proxy then I will dig more in that direction)
Update: containers are loading now - for example both pods for VID come up ok if we first run the config-init pod to bring up the config mounts. Also there is an issue with unresolved DNS entries that is fixed temporarily by adding to /etc/resolv.conf
Good news – 32 of 33 pods are up (sdnc-portal is going through a restart).
Ran 2 parallel Rancher systems on 48G Ubuntu 16.04.2 VM’s on two 64G servers
Stats: Without DCAE (which is up to 40% of ONAP) we run at 33G – so I would expect a full system to be around 50G which means we can run on a P70 Thinkpad laptop with 64G.
Had to add some dns-search domains for k8s in interfaces to appear in resolv.conf after running the config pod.
Issues:
after these 2 config changes the pods come up within 25 min except policy-drools which takes 45 min (on 1 machine but not the other) and sdnc-portal (which is having issues with some node downloads)
Michael O'Brien - (deprecated as of 20170508) - use obrienlabs i've got to the point where i can access the portal login page, but after inputting the credentials, it keeps redirecting to port 8989 and fails instead of the external mapped port (30215 in my case) any thoughts ?
i'm running on GCE with 40GB and only running sdc, message-router and portal for now.
I ran the OOM installation from scratch and managed to logged to Portal by changing back the port to 30215 after the redirection of the login.
Also when i logged in with cs0008 user and click on SDC, i have: "can’t establish a connection to the server at sdc.api.simpledemo.onap.org:8181" (should be changed to port 30206?)
Do you know which config has to be changed for this?
Are you accessing the ECOMP Portal via the 'onap-portal vnc-portal-1027553126-h6dhd' container?
This container was added to the standard ONAP deployment so one may VNC into the ONAP Deployment instance (namespace) and have networking resolved fully resolved within K8s.
Docker process are not running by own may be due to proxy internet being used. Trying running manually the install and setup by logging to each component.
Hi, there are a combination of files - some are in the container itself - see /var/opt
some are off the shared file system on the host - see /dockerdata-nfs
In the case of robot - you have spun up one pod - each pod has a single docker container, to see the other pods/containers - kubectl into each like you have into robot - just change the pod name. kubectl is an abstraction on top of docker - so you don't need to directly access docker containers.
Yes, I can see the mounted directories and found robot_install.sh in /var/opt/OpenECOMP_ETE/demo/boot
On K8s Dashboard and CLI, the POD is in running state but when I logged in (via kubectl) any of them, I am unable to see any docker process running via docker ps. (Even docker itself is not installed)
I think this Ideally is taken care by POD itself right or do we need to go inside each component and run the installation script of that specific.
Vaibhav, Hi, the architecture of kubernetes is such that it manages docker containers - we are not running docker on docker. Docker ps will only be possible on the host machine(s)/vm(s) that kubernetes is running on - you will see the wrapper docker containers running the kubernetes and rancher undercloud.
When you "kubectl exec -it" - into a pod you have entered a docker container the same as a "docker exec -it" at that point you are in a container process, try doing a "ps -ef | grep java" to see if a java process is running for example. Note that by the nature of docker most containers will have a minimal linux install - so some do not include the ps command for example.
If you check the instructions above you will see the first step is to install docker 1.12 only on the host - as you end up with 1 or more hosts running a set of docker containers after ./createAll.bash finishes
example - try the mso jboss container - it is one of the heavyweight containers
if you want to see the k8s wrapped containers - do a docker ps on the host
root@ip-172-31-93-122:~# docker ps | grep mso
9fed2b7ebd1d nexus3.onap.org:10001/openecomp/mso@sha256:ab3a447956577a0f339751fb63cc2659e58b9f5290852a90f09f7ed426835abe "/docker-files/script" 4 days ago Up 4 days k8s_mso_mso-371905462-w0mcj_onap-mso_11da22bf-8b3d-11e7-9e1a-0289899d0a5f_0
e4171a2b73d8 nexus3.onap.org:10001/mariadb@sha256:3821f92155bf4311a59b7ec6219b79cbf9a42c75805000a7c8fe5d9f3ad28276 "/docker-entrypoint.s" 4 days ago Up 4 days k8s_mariadb_mariadb-786536066-87g9d_onap-mso_11bc6958-8b3d-11e7-9e1a-0289899d0a5f_0
f099c5613bf1 gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 days ago Up 4 days k8s_POD_mariadb-786536066-87g9d_onap-mso_11bc6958-8b3d-11e7-9e1a-0289899d0a5f_0
Hi all, I am new to kubernetes installation of ONAP and have problems cloning onap repository. I have tried git clone -b release-1.0.0 http://gerrit.onap.org/r/oom but ended up with the following error fatal: unable to access 'http://gerrit.onap.org/r/oom/': The requested URL returned error: 403
I also tried to use ssh git clone -b release-1.0.0 ssh://cnleng@gerrit.onap.org:29418/oom but I cannot access settings on https://gerrit.onap.org (Already have an account on Linux foundation) to copy my ssh keys Any help will be appreciated. Thanks
Hi, I am trying to install ONAP components though oom, but getting the following errors:
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-appc.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal
I tried to edit /etc/resolve.conf according to Michael's comment above:
Geora, hi, that is a red herring unfortunately - there is a bug in rancher where they add more than 5 domains to the search tree - you can ignore these - the resolve.conf turns out to have no effect - it is removed except in the comment history
Has anyone managed to run ONAP on Kubernetes with more than one node? i'm unclear about how the /dockerdata-nfs volume mount works in the case of multiple nodes.
1) in my azure setup, i have one master node and 4 agent nodes (Standard D3 - 4CPU/ 14GB). after running the config-init pod (and completing) i do not see the /dockerdata-nfs directory being created on the master node. i am not sure how to check this directory on all the agent nodes. Is this directory expected to be created on all the agent nodes? if so, are they kept synchronized?
2) after the cluster is restarted/ there is a possibility that pods will run on different set of nodes, so if the /dockerdata-nfs is not kept in sync between the agent nodes, then the data will not be persisted.
ps: i did not use rancher. i created the k8s cluster using acs-engine.
The mounting of the shared dockerdata-nfs volume does not appear to happen automatically. You can install nfs-kernel-server and mount a shared drive manually. If you are running rancher on the master node (the one with the files in the /dockerdata-nfs directory, mount that directory to the agent nodes:
On Master:
# apt-get install nfs-kernel-server
Modify /etc/exports to share directory from master to agent nodes
I am trying to install ONAP on Kubernetes and I got the following error while trying to run ./createAll.bash -n onap -a robot|appc|aai command:
Command 'mppc' from package 'makepp' (universe) Command 'ppc' from package 'pearpc' (universe) appc: command not found No command 'aai' found, did you mean: Command 'axi' from package 'afnix' (universe) Command 'ali' from package 'nmh' (universe) Command 'ali' from package 'mailutils-mh' (universe) Command 'aa' from package 'astronomical-almanac' (universe) Command 'fai' from package 'fai-client' (universe) Command 'cai' from package 'emboss' (universe) aai: command not found
Does anyone have an idea? (kubernetes /helm is already up and running)
Hi, Michael O'Brien .I am trying to install ONAP through the way above and encountered a problem.
The pod of hbase in kubernetes returns to “Readiness probe failed: dial tcp 10.42.76.162:8020: getsockopt: connection refused”. It seems like the service of hbase is not started as expected.The container named hbase in Rancher logs:
Starting namenodes on [hbase] hbase: chown: missing operand after '/opt/hadoop-2.7.2/logs' hbase: Try 'chown --help' for more information. hbase: starting namenode, logging to /opt/hadoop-2.7.2/logs/hadoop--namenode-hbase.out localhost: starting datanode, logging to /opt/hadoop-2.7.2/logs/hadoop--datanode-hbase.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.2/logs/hadoop--secondarynamenode-hbase.out starting zookeeper, logging to /opt/hbase-1.2.3/bin/../logs/hbase--zookeeper-hbase.out starting master, logging to /opt/hbase-1.2.3/bin/../logs/hbase--master-hbase.out starting regionserver, logging to /opt/hbase-1.2.3/bin/../logs/hbase--1-regionserver-hbase.out
Nexus3 usually has intermittent connection issues - you may have to wait up until 30 min. Yesterday I was able to bring it up on 3 systems with the 20170906 tag (All outside the firewall)
I assume MSO (earlier in the startup) worked - so you don't have a proxy issue
Sent: Friday, September 8, 2017 14:36 To:onap-discuss@lists.onap.org Subject: [onap-discuss] [oom] config pod changes
OOM users,
I’ve just pushed a change that requires a re-build of the /dockerdata-nfs/onap/ mount on your K8s host.
Basically, what I’ve tried to do is port over the heat stack version of ONAPs configuration mechanism. The heat way of running ONAP writes files to /opt/config/ based on the stack’s environment file that has the details related to each users environment. These values are then swapped in to the various VMs containers using scripts.
Now that we are using helm for OOM, I was able to do something similar in order to start trying to run the vFW/vLB demo use cases.
I have also been made aware that this change requires K8s 1.6 as I am making use of the “envFrom” https://kubernetes.io/docs/api-reference/v1.6/#container-v1-core. We stated earlier that we are setting minimum requirements of K8s 1.7 and rancher 1.6 for OOM so hopefully this isn’t a big issue.
It boils down to this:
/oom/kubernetes/config/onap-parameters.yaml is kind of like file “onap_openstackRC.env” and you will need to define some required values otherwise the config pod deployment will fail.
1? I am trying to install ONAP on Kubernetes and encountered a problem.
I create msb pods first by command "./createAll.bash -n onap -a msb", then create aai pods by command "/createAll.bash -n onap -a aai". The problem is that all serviceName and url of aai do not register to msb as expected. I find the code of aai project has those lines "
Goal: I want to deploy and manage vFirewall router using ONAP.
I installed ONAP on Kubernetes using oom(release-1.0.0). All Services are running except DCAE as it is not yet completely implemented in Kubernetes. Also, I have an OpenStack cluster configured separately.
How can I integrate DCAE to the above Kubernetes cluster?
{"log":"Waiting for resources to be up\n","stream":"stdout","time":"2017-09-21T18:23:53.274547381Z"} {"log":"aai-resources.api.simpledemo.openecomp.org: forward host lookup failed: Unknown host\n","stream":"stderr","time":"2017-09-21T18:23:58.279615776Z"} {"log":"Waiting for resources to be up\n","stream":"stdout","time":"2017-09-21T18:23:58.279690784Z"}
I am using OOM 1.1.0 version. I have pre pulled all the images using the prepull_docker.sh. But after creating the pods using createAll.sh script all the pods are coming up except DCAE. Is DCAE supported in 1.1.0 release? If not then when is it expected to be functional? Will I be able to run the vFW demo close loop without DCAE?
More details below:
The DCAE specific images shown are:
root@hcl:~# docker images | grep dcae
nexus3.onap.org:10001/openecomp/dcae-controller 1.1-STAGING-latest ff839a80b8f1 12 weeks ago 694.6 MB
nexus3.onap.org:10001/openecomp/dcae-collector-common-event 1.1-STAGING-latest e3daaf41111b 12 weeks ago 537.3 MB
nexus3.onap.org:10001/openecomp/dcae-dmaapbc 1.1-STAGING-latest 1fcf5b48d63b 7 months ago 328.1 MB
The DCAE health check is failing
Starting Xvfb on display :88 with res 1280x1024x24
ConnectionError: HTTPConnectionPool(host='dcae-controller.onap-dcae', port=8080): Max retries exceeded with url: /healthcheck (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f26aee31550>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Vidhu, hi, DCAE was in in 1.0 of OOM on 28 Sept 2017 - however for R1/Amsterdam the new project DCAEGEN2 was only done in HEAT. There is an effort to move the containers to Kubernetes, an effort to use the developer setup with 1 instead of 7 cdap hadoop nodes and an effort to complete the bridge between the hybrid HEAT/Kubernetes setup - specific only to DCAEGEN2. One or more of these should be in shortly as we work the DCAE team. You are welcome to help both teams with this large effort.
While oneclick/createAll.bash includes DCAEGEN2 pod creation, the automation script cd.sh hits the ERROR condition when creating DCAEGEN2 because createAll.bash expect /home/ubuntu/.ssh/onap_rsa to exist. Here's some output from one of today's Jenkin's run console log (http://jenkins.onap.info/job/oom-cd/1853/consoleFull):
19:21:03********** Creating deployments for dcaegen2 **********
19:21:0319:21:03Creating namespace **********
19:21:03namespace "onap-dcaegen2" created
19:21:0319:21:03Creating service account **********
19:21:03clusterrolebinding "onap-dcaegen2-admin-binding" created
19:21:0319:21:03Creating registry secret **********
19:21:03secret "onap-docker-registry-key" created
19:21:0319:21:03Creating deployments and services **********
19:21:03ERROR: /home/ubuntu/.ssh/onap_rsa does not exist or is empty. Cannot launch dcae gen2.
19:21:03ERROR: dcaegen2 failed to configure: Pre-requisites not met. Skipping deploying it and continue
19:21:0419:21:04
Yes, DCAEGEN2 works via OOM- I verified it last friday. However only in the amsterdam release with the proper onap-parameters.yaml (will be ported to Beijing/master shortly).
Hi, I sense that there is a bit lack of information here. which, I would be happy to acquire.
There is a file that describes the onap environment, "onap-parameters.yaml". I think that it will good practice to provide data on how to fill it (or acquire the values that should be resides in it).
Mor, You are welcome to help us finish the documentation for OOM-277
The config was changed on friday - those us here are playing catch up on some of the infrastructure changes as we are testing the deploys every couple days - you are welcome to add to the documentation here - usually the first to encounter an issue/workaround documents it - so the rest of us can benefit.
Most of the content on this tutorial is added by developers like yourself that would like to get OOM deployed and fully functional - at ONAP we self document anything that is missing
There was a section added on friday for those switching from the old-style config to the new - you run a helm purge
The configuration parameters will be specific to your rackspace/openstack config - usually you match your rc export. There is a sample posted from before when it was in the json file in mso - see the screen cap.
The major issue is than so far no one using pure public ONAP has actually deployed a vFirewall yet (mostly due to stability issues with ONAP that are being fixed)
First verify that your portal containers are running in K8s (including the vnc-portal). Make notice of the 2/2 and 1/1 Ready states. If a 0 is on the left of those numbers then the container is not fully running.
Hi, is there a page available where we could find any sort of updated list/diagram of the dependencies between the different onap components? Also is there a breakdown of the memory requirements for the various oom components?
No official documentation on the dependencies at this point. But a very good idea to add. I will look into doing this.
For now you can see the dependencies in each of the deployment descriptors like in the AAI traversal example (see below) that depends on aai-resource and hbase containers before it starts up. In OOM we make use of Kubernetes init-containers and readiness probes to implement the dependencies. This prevents the main container in the deployment descriptor from starting until its dependencies are "ready".
oom/kubernetes/aai/templates] vi aai-traversal-deployment.yaml
Samuel, To add to the dependency discussion by Mike - Ideally I would like to continue the deployment diagram below with the dependencies listed in the yamls he refers to
The diagram can be edited by anyone - I will take time this week and update it.
Hi, VFC is still a work in progress - the VFC team is working through issues with their containers. You don't currently need VFC for ONAP to function - you can comment it out of the oneclick/setenv.bash helm line (ideally we would leave out services that are still WIP).
I am trying to bring up ONAP using Kubernets. Can you tell please if I should pull only OOM release-1.0.0 or a pull from master branch should also be fine, to get the ONAP up & running and also to run demo on it.
Rajesh, Hi, the latest master is 1.1/R1 - the wiki is now targeting 1.1 - I'll remove the 1.0 link. Be aware that ONAP in general is undergoing stabilization at this point.
I am getting the same error as a few people above when it comes to accessing SDC where it says I am not authorized to view this page, and it also gives me a 500 error. My initial impression is that this might be because I cannot reach the IP corresponding to the sdc.api.simpledemo.openecomp.org in the /etc/hosts file from my vnc container.
Could anybody confirm if this may cause an issue? And if so, which container/host/service IP should be paired with the sdc url?
Actually, I believe the resolution is correct, as it maps to the sdc-fe service, and if I change the IP to any other service the sdc web page times out. Also, if I curl<sdc-url>:8080 I do get information back. I am still not sure what might be causing this issue. Currently I am trying to look through the sdc logs for hints, but no luck as of yet
actually those are for sdc-be, I see a chef error on sdc-es - but the pod starts up ok (need to verify the endpoints though) - also this pod is not slated for the elk filebeat sister container - it should
[2017-10-14T11:06:17-05:00] ERROR: cookbook_file[/usr/share/elasticsearch/config/kibana_dashboard_virtualization.json] (sdc-elasticsearch::ES_6_create_kibana_dashboard_virtualization line 1) had an error: Chef::Exceptions::FileNotFound: Cookbook 'sdc-elasticsearch' (0.0.0) does not contain a file at any of these locations:
files/debian-8.6/kibana_dashboard_virtualization.json
files/debian/kibana_dashboard_virtualization.json
files/default/kibana_dashboard_virtualization.json
files/kibana_dashboard_virtualization.json
This cookbook _does_ contain: ['files/default/dashboard_BI-Dashboard.json','files/default/dashboard_Monitoring-Dashboared.json','files/default/visualization_JVM-used-CPU.json','files/default/visualization_JVM-used-Threads-Num.json','files/default/visualization_number-of-user-accesses.json','files/default/logging.yml','files/default/visualization_JVM-used-Memory.json','files/default/visualization_host-used-Threads-Num.json','files/default/visualization_Show-all-certified-services-ampersand-resources-(per-day).json','files/default/visualization_Show-all-created-Resources-slash-Services-slash-Products.json','files/default/visualization_host-used-CPU.json','files/default/visualization_Show-all-distributed-services.json']
[2017-10-14T11:06:17-05:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
getting a chef exit on missing elk components in sdd-es - even though this one is not slated for the sister filebeat container - likely a reused script across all pods in sdc - will take a look
oot@obriensystemsu0:~/onap/oom/kubernetes/oneclick# kubectl logs -f -n onap-aai aai-traversal-3982333463-vb89g aai-traversalCloning into 'aai-config'... [2017-10-14T10:50:36-05:00] INFO: Started chef-zero at chefzero://localhost:1 with repository at /var/chef/aai-config One version per cookbook environments at /var/chef/aai-data/environments
[2017-10-14T10:50:36-05:00] INFO: Forking chef instance to converge... Starting Chef Client, version 13.4.24 [2017-10-14T10:50:36-05:00] INFO: *** Chef 13.4.24 *** [2017-10-14T10:50:36-05:00] INFO: Platform: x86_64-linux [2017-10-14T10:50:36-05:00] INFO: Chef-client pid: 43 [
I am trying to setup ONAP using Kubernetes. I am using rancher to setup Kubernetes cluster. i am having 5 machine with 16GB memory each. Configured kubernentes successfully. when i am running createAll.bash to setup ONAP application, some of the components are successfully configured and running but some of the components are failing and with "ImagePullOfBack" error.
when i am trying to pull images independently i am able to download images from nexus successfully but not when running through createAll script. When i went through the script seem everything fine and not able to understand what is wrong. could you please help me understand the issue.
Hi, try running the docker pre pull script on all of your machines first. Also you may need to duplicate /dockerdata-nfs across all machines - manually or via a shared drive.
Yes, we have been getting this since last friday - I have been too busy to raise an issue like normal - this is not as simple as onap-parameters.xml it looks like a robot change related to the SO rename - will post a JIRA/workaround shortly. Anyway SO is not fully up on OOM/Heat anyway currently.
I have brought up ONAP using OOM master branch which I have pulled yesterday.But on running health check I am facing similar issues as discussed above where MSO fails with 503 error, and I also see portal failing with 404 error.
Can you please let us know if there is any workaround for this issue or is there any build where the necessary components for running vFW/vDNS demos like portal,SDC,AAI,SO,VID,SDNC,Policy and DCAE are healthy.
how do I set/correct the missing values in the health check? How do I know if everything should be working with a current deployment?
root@onap-oom-all-in-one:/dockerdata-nfs/onap/robot# ./ete-docker.sh health
Starting Xvfb on display :88 with res 1280x1024x24
Executing robot tests at log level TRACE
==============================================================================
OpenECOMP ETE
==============================================================================
OpenECOMP ETE.Robot
==============================================================================
OpenECOMP ETE.Robot.Testsuites
==============================================================================
[ ERROR ] Error in file '/var/opt/OpenECOMP_ETE/robot/resources/clamp_interface.robot': Setting variable '${CLAMP_ENDPOINT}' failed: Variable '${GLOBAL_CLAMP_SERVER_PROTOCOL}' not found. Did you mean:
${GLOBAL_DCAE_SERVER_PROTOCOL}
${GLOBAL_APPC_SERVER_PROTOCOL}
${GLOBAL_MR_SERVER_PROTOCOL}
${GLOBAL_MSO_SERVER_PROTOCOL}
${GLOBAL_AAI_SERVER_PROTOCOL}
${GLOBAL_ASDC_SERVER_PROTOCOL}
[ ERROR ] Error in file '/var/opt/OpenECOMP_ETE/robot/resources/msb_interface.robot': Setting variable '${MSB_ENDPOINT}' failed: Variable '${GLOBAL_MSB_SERVER_PROTOCOL}' not found. Did you mean:
${GLOBAL_MSO_SERVER_PROTOCOL}
${GLOBAL_MR_SERVER_PROTOCOL}
${GLOBAL_ASDC_SERVER_PROTOCOL}
${GLOBAL_SDNGC_SERVER_PROTOCOL}
${GLOBAL_VID_SERVER_PROTOCOL}
${GLOBAL_AAI_SERVER_PROTOCOL}
${GLOBAL_DCAE_SERVER_PROTOCOL}
${GLOBAL_APPC_SERVER_PROTOCOL}
OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp components are...
==============================================================================
Basic DCAE Health Check [ WARN ] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e8f955fd0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /gui
[ WARN ] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e8fe14350>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /gui
[ WARN ] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e8fda87d0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /gui
| FAIL |
ConnectionError: HTTPConnectionPool(host='dcae-controller.onap-dcae', port=9998): Max retries exceeded with url: /gui (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2e8de52250>: Failed to establish a new connection: [Errno -2] Name or service not known',))
------------------------------------------------------------------------------
Basic SDNGC Health Check | PASS |
------------------------------------------------------------------------------
Basic A&AI Health Check | PASS |
------------------------------------------------------------------------------
Basic Policy Health Check | PASS |
------------------------------------------------------------------------------
Basic MSO Health Check | FAIL |
503 != 200
------------------------------------------------------------------------------
Basic ASDC Health Check | PASS |
------------------------------------------------------------------------------
Basic APPC Health Check | PASS |
------------------------------------------------------------------------------
Basic Portal Health Check | PASS |
------------------------------------------------------------------------------
Basic Message Router Health Check | PASS |
------------------------------------------------------------------------------
Basic VID Health Check | PASS |
------------------------------------------------------------------------------
Basic Microservice Bus Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
Basic CLAMP Health Check | FAIL |
Variable '${CLAMP_ENDPOINT}' not found.
------------------------------------------------------------------------------
catalog API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
emsdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
gvnfmdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
huaweivnfmdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
jujuvnfmdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
multicloud API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
multicloud-ocata API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
multicloud-titanium_cloud API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
multicloud-vio API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
nokiavnfmdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
nslcm API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
resmgr API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
usecaseui-gui API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
vnflcm API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
vnfmgr API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
vnfres API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
workflow API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
ztesdncdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
ztevmanagerdriver API Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp compo... | FAIL |
31 critical tests, 8 passed, 23 failed
31 tests total, 8 passed, 23 failed
==============================================================================
OpenECOMP ETE.Robot.Testsuites | FAIL |
31 critical tests, 8 passed, 23 failed
31 tests total, 8 passed, 23 failed
==============================================================================
OpenECOMP ETE.Robot | FAIL |
31 critical tests, 8 passed, 23 failed
31 tests total, 8 passed, 23 failed
==============================================================================
OpenECOMP ETE | FAIL |
31 critical tests, 8 passed, 23 failed
31 tests total, 8 passed, 23 failed
==============================================================================
Output: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_14804/output.xml
Log: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_14804/log.html
Report: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_14804/report.html
A persistent NFS mount is recommended in the official docs - this is a collaborative wiki - as in join the party of overly enthusiastic developers - in my case I run on AWS EBS so not an issue - you are welcome to help document the ecosystem.
The sky at OOM is a very nice shade of blue!
Sorry I am super excited about the upcoming developer conference on 11 Dec.
In my setup, I am able to start the ONAP components only if all the images already are downloaded using prepull_docker.sh. So far, I have been able to start all aai components using "createAll.bash -n onap -a aai" after the images have been downloaded using prepull_docker.sh.
Here are the challenges I am facing
"nexus3.onap.org:10001/onap/clamp" is downloaded in the local docker repository but "kubectl get pods --all-namespaces | grep clamp" fails with the following error
Thanks Beili. Below is the error I get for clamp. Looks like clamp is expecting some configuration, specifically password. Any clues on the specific configuration which needs to be updated?
*************************** APPLICATION FAILED TO START ***************************
Description:
Binding to target org.onap.clamp.clds.config.EncodedPasswordBasicDataSource@53ec2968 failed:
Property: spring.datasource.camunda.password Value: strong_pitchou Reason: Property 'password' threw exception; nested exception is java.lang.NumberFormatException: For input string: "st"
Use the recommended subset (essentially ONAP 1.0 components from the original seed code in Feb 2017 - these work with the vFirewall use case - until we stabilize the R1 release.
Clamp, aaf, and vfc are currently still being developed - there are usually 2 to pod failures in these components - I will post the JIRAs. - these are known issues and being worked on in the OOM JIRA board.
You don't need these 3 components to run the vFirewall - for now I would exclude them in HELM_APPS in setenv.bash - later when they are stable you can add them back.
Yes, been thinking about this for some time - and I have seen issues where we don't pick up problems we should have with for example the openecomp to onap refactor earlier this week - As you know from the TSC meeting yesterday - the manifest is still in flux in the move to the dockerhub versions
I am not sure yet - but I would expect that master continues to pull from nexus/nexus3, and the R1 branch pulls from dockerhub - but need to verify - put a watch on the JIRA - I usually update them with critical info/links/status
I have successfully start onap on kubernetes with below apps in setenv.sh. All pods show 1/1 running, but when I login to portal I only SDC. Why are the other modules not appearing in portal?
Thanks Rahul Sharma. I have encountered another issue, SDC keeps giving me 500 error saying you are authorized to view this page, when I login as cs0008. I see in comments above that this is a known issue. Is there a workaround for this or can I pull older/stable code to avoid this?
This is a great accomplishment for us to start playing with- thanks a lot Amar and Prakash for your effort putting things together. One thing I mentioned earlier in the call, we probably need to review and upgrade not using Docker 1.12 (2 years old) where Docker now moving away to 1.13 last year now Docker CE (public) and Docker EE (Enterprise) where number starting with Docker 1.17.x (2017=1.17, 2018, 1.18). Also Rancher is not mandatory just to build Kubernetes only as I met several customers using in production where we can build Kubernetes 1.6, 1.7 or 1.8 quite easy now using Kubeadm in few minutes (skipping Rancher). I meant Rancher is good for other usecases where customers need multi orchestrator environment (K8s, Mesos, Swarm). I don't see real value for Rancher to be here in our ONAP document where it might be confusing people that Rancher is mandatory just for bringing up K8s. Another thing, I was attending last Docker conference, Kubernetes will soon support Containerd in which CLI command to be running will be "crictl" not "kubectl" anymore, allowing Kubernetes to be working directly with Containerd, thus improving performance for Kubernetes where ONAP will be fully taking benefif of (GA will be end of 2017). We probably need to closely follow what Kubernetes community is heading to so accordingly update our documentation. Kind of difficult to update our documentation every month but keep up with Kubernetes is a good way to catch in my opinion...
I agree - we will move from docker 1.12 when we move from Rancher 1.6.10 to Rancher 2.0 - where we can use 1.17.x - but it is a Rancher + Docker + Kubernetes config issue.
Rancher is not required - we tried minikube, there are also commercial CaaS frameworks - however Rancher is the simplest and fastest approach at the moment.
You are welcome to join the OOM call at 10AM EDT on Wed - we usually go through the JIRA board - and the Kubeadm work sounds like a good Epic to work on. We are very interested in various environments and alternatives for running our pods - please join.
There is also a daily OOM blitz on stabilizing the branch and deploying the vFirewall use case that you are welcome to attend
1200EDT noon until either the 4th Dec KubeCon or the 11 dec ONAP developer conference.
Hi all. I have a question. In the page of installation using HEAT, v CPU needs 148, but this page discribes 64 v CPU needed. why these has differences so much. are there differences of items that can be installed?
Good question, as you know CPU can be over-provisioned - threads will just queue more, unlike RAM and HD which cannot be shared. 64 vCPUs is a recommended # of vCPUs based on bringing up the system on 64 and 128 core systems on AWS - we top out at 44 cores during startup (without DCAE - so this may be multiplied by 3/2 in that case as DCAE has 1/3 the containers in ONAP). Therefore for non-staging/non-production systems you will not gain anything having more that 44 vCores until we start hammering the system with real world VNF traffic. The HEAT provisioning is a result of the fact that the docker allocation model is across multiple silo VMs and not flat like in Kubernetes currently. Therefore some servers may only use 1/8 where others may peak at 7/8. It all depends on how you use onap.
You can get away during development with 8 vCores - ONAP will startup in 11m instead of 7 on 32 vCores.
Since DCAE is not currently in Kubernetes in R1 - then you need to account for it only in openstack.
Depending on the VNF use case you don't need the whole system yet, for example the vFW only needs 1.0.0. era components, where vVolte and vCPE will need new R1 components - see the HELM_APPS recommendation in this wiki.
Similar ONAP HEAT deployment (without DCAE or the OPEN-O VM - triple the size in that case) - this will run the vFirewall but not to closed-loop.
thank you for your answering my question. It's make me easier to understand. I'll use HEAT installation and allocate tempolarily 148 v CPU because of need to use DCAE. I'll also see the page you referenced.
I was getting the following error when running "./createConfig.sh -n onap"
Error: release onap-config failed: namespaces "onap" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "onap"
I think , the difference in both version is about the init container, due to which in v1.8.3, it waits for the dependent container to come up due to which some time the dependent container gets timed out for me like vnc-portal.
such as drools checking for brmsgw to become up:-
2017-11-27 08:16:46,757 - INFO - brmsgw is not ready. 2017-11-27 08:16:51,759 - INFO - Checking if brmsgw is ready 2017-11-27 08:16:51,826 - INFO - brmsgw is not ready. 2017-11-27 08:16:56,831 - INFO - Checking if brmsgw is ready 2017-11-27 08:16:56,877 - INFO - brmsgw is ready!
2) Using docker ps –a command to list the containers.
root@k8s-2:/# docker ps -a | grep sdc-be
347b4da64d9c nexus3.onap.org:10001/openecomp/sdc-backend@sha256:d4007e41988fd0bd451b8400144b27c60b4ba0a2e54fca1a02356d8b5ec3ac0d "/root/startup.sh" 53 minutes ago Up 53 minutes k8s_sdc-be_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_0
2b4cf42b163a oomk8s/readiness-check@sha256:ab8a4a13e39535d67f110a618312bb2971b9a291c99392ef91415743b6a25ecb "/root/ready.py --con" 57 minutes ago Exited (0) 53 minutes ago k8s_sdc-dmaap-readiness_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_3
a066ef35890b oomk8s/readiness-check@sha256:ab8a4a13e39535d67f110a618312bb2971b9a291c99392ef91415743b6a25ecb "/root/ready.py --con" About an hour ago Exited (0) About an hour ago k8s_sdc-be-readiness_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_0
1fdc79e399fd gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df
3) Use this command to see the docker logs
Docker logs 347b4da64d9c | grep err/exceptions
4) Observe the error logs and exceptions.
Currently we are getting below mentioned exceptions:
Recipe Compile Error in /root/chef-solo/cache/cookbooks/sdc-catalog-be/recipes/BE_2_setup_configuration
2017-12-06T11:53:48+00:00] ERROR: bash[upgrade-normatives] (sdc-normatives::upgrade_Normatives line 7) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.openecomp.sdcrests.health.rest.services.HealthCheckImpl]: Constructor threw exception; nested exception is java.lang.ExceptionInInitializerError
We are following below mentioned link for configuration.
Installing on Azure - other than the network security groups via portal.azure.com screenshots seemed to go okay up to running cd.sh.
You need to number the steps since sometimes its not obvious when you are switching to a new task vs describing some future or optional part. Had to be careful to not blindly copy/paste since you have multiple versions in the steps some with notes like "# below 20171119- still verifying - donot use" which was confusing. The video has the steps which is good but its tedious to start/stop the video and then look at the next step in the wiki. I will update when it completes.
Do we need to add port 10250 to the security groups ? I got error messages on cd.sh (but admittedly I didnt watch that part of the video)
Azure VMs seem to only have a 30GB OS disk. I can add a data disk but I think I should run the install from someplace other than root. Is that simple to change in cd.sh ?
missing from OOM (looks like we don't need these at least until after vf-module creation - or we are just missing jms messages)
5bc9e04a29e3 onap/sdnc-ueb-listener-image:latest "/opt/onap/sdnc/ue..." 2 days ago Up 2 days sdnc_ueblistener_container
2fc3b79f74d2 onap/sdnc-dmaap-listener-image:latest "/opt/onap/sdnc/dm..." 2 days ago Up 2 days sdnc_dmaaplistener_container
for SDC - would raise a JIRA but I don't see the sanity container in HEAT - I see the same 5 containers in both
HEAT
root@onap-sdc:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9622747f5df2 nexus3.onap.org:10001/openecomp/sdc-frontend:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 0.0.0.0:8181->8181/tcp, 8080/tcp, 0.0.0.0:9443->9443/tcp sdc-FE
85733ad254f7 nexus3.onap.org:10001/openecomp/sdc-backend:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 0.0.0.0:8080->8080/tcp, 0.0.0.0:8443->8443/tcp sdc-BE
5ece278fb37c nexus3.onap.org:10001/openecomp/sdc-kibana:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 0.0.0.0:5601->5601/tcp sdc-kbn
d75c2263186d nexus3.onap.org:10001/openecomp/sdc-cassandra:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 7000-7001/tcp, 0.0.0.0:9042->9042/tcp, 7199/tcp, 0.0.0.0:9160->9160/tcp sdc-cs
25d35c470325 nexus3.onap.org:10001/openecomp/sdc-elasticsearch:v1.1.0 "/root/startup.sh" 2 days ago Up 2 days 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp sdc-es
OOM
ubuntu@ip-172-31-82-11:~$ kubectl get pods --all-namespaces -a | grep sdc
onap-sdc sdc-be-2336519847-knfqw 2/2 Running 0 40m
onap-sdc sdc-cs-1151560586-35df3 1/1 Running 0 40m
onap-sdc sdc-es-2438522492-8cfj1 1/1 Running 0 40m
onap-sdc sdc-fe-2862673798-4fgzp 2/2 Running 0 40m
onap-sdc sdc-kb-1258596734-z4970 1/1 Running 0 40m
You did point out the disk size requirements in the video. The issue is really that AWS makes that a setting at VM create and Azure you have to separately create the data disk (or at least I couldn't find a way to do it on the original create via the portal)
BTW, thanks Brian for the review - when I started I brought up HEAT in May 2017 and enumerated all the containers to get a feel - we should have done another pass on all the vms - but without someone who would know the optional ones like in SDC we would have missed the sdc-sanity one - thanks
You can run the scripts from anywhere - I usually run as ubuntu not root - the reason the rancher script is root is because you would need to log out back in to pick up the docker user config for ubuntu.
I run either directly in /home/ubunutu or /root
The cloned directory will put oom in either of these
For ports - yes try to open everything - on AWS I run with an all open CIDR security group for ease of access - on Rackspace the VM would need individual port opennings
Yes, the multiple steps are confusing - trying to help out a 2nd team that is working using Helm 2.7 to use the tpl function - I'll remove those until they are stable
Updated wiki - thought I removed all helm 2.6/2.7 - i was keeping the instructions on aligning the server and client until we fix the vnc-portal issue under helm 2.6 - this wiki gets modified a lot as we move through all the rancher/helm/kubernetes/docker version
Hi, I'm new to ONAP and cloud computing in general, but trying to work through the above guide. I'm at the point where I'm waiting for the onap pods to come up. Most have come up, but some seem to be stuck after 2 hrs. I'm wondering if perhaps I have insufficient memory available. I'm installing on a KVM VM with 16 vCPU, 55G RAM and 220G HD.
One thought is to shutdown the VM, increase RAM to about 60G and restart, but I'm uncertain as to the pontential implications. Any suggestions as to how I could proceed would be greatly appreciated.
Unless you've taken the step to remove some components from the HELM_APPS variable in the setenv.bash script (after the oom repository was cloned), you very likely require 64 GB of RAM.
I've successfully deployed a subset of the components in a 48GB RAM VM with HELM_APPS set to this:
Thanks alot James. I have 72G on my host, but would like to leave room for additional VM's, like vFirewall. So I'll try removing some components as you suggested. Will give me an opportunity to try the clean up
Anyone who will try to install/deploy ONAP SDC container , will get an issue in SDC pod come up issue.
Exceptions:-
Recipe Compile Error in /root/chef-solo/cache/cookbooks/sdc-catalog-be/recipes/BE_2_setup_configuration
2017-12-06T11:53:48+00:00] ERROR: bash[upgrade-normatives] (sdc-normatives::upgrade_Normatives line 7) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.openecomp.sdcrests.health.rest.services.HealthCheckImpl]: Constructor threw exception; nested exception is java.lang.ExceptionInInitializerError
Correct, looks like a standard spring bean startup error -specific to SDC -which should also be failing in the HEAT deployment - I tested last night release-1.1.0 to test a merge in oom and all my pods are up except the known aaf - also the CD job is OK
this bothers me though - as I hope we are not missing something that only yourself sees - will look more into it - you are using 1.1.0 or master (master may have issues)
Also are you bringing up anything - as if you check the yaml there are dependencies
In your onap-discuss post last night - you did not have the dependent pods up - did this fix the issue - I quickly looked at the code and the HealhCheckImpl class is doing healthchecks - which would fail I would expect on dependent pods not up
Easiest way is to go the the Kubernetes UI, then under the onap-robot namespace, click on the Deployments tab, then click the three dots next to the deployment to update (in this case, robot), it will pop up a window where you can edit, among everything deployment parameters, the image version. Then click update. This will bounce the deployment (hence the pod), and will create a new deployment with the changes.
SDNC org.ops4j.pax.logging.cfg isnt the same as the file in gerrit. I noticed there is a different file in dockerdata-nfs/onap/log/sdnc that appears to come from the OOM repo instead of the CCSDK repo (same OOM file looks to be used for appc). Why isnt the SDNC logging configuration being used ?
What you're mentioning, Brian, isthe major issue we currently have in OOM: we need to fork projects' config in order to adjust to kubernetes context, whether it's for address resolution, or for logging. I'll let Michael O'Brien explained what was done for the logs. But the overall purpose wrt logging is to centralized them and have them browsable through a Kibana interface (using logstash). Regarding the address resolution, well, kubernetes provide it's own way of resolving services within namespaces, <service>.<namespace>:<internal-port>. Because of this, everywhere in the config where there is some network config we change it to levrage k8s networking.
Brian, yes there is a centralized logging configuration that has the RI in the logging-analytics repo - this ELK stack available on the onap-log kibana container internal port 5601 uses a filebeat container (all the 2/2 pods) to pipe the logs in through a set of PV's using the emptyDir directive in the yaml. A logging spec is being worked out.
Well the logging team needs to find a solution for the heavy user of the local logs where we turn on DEBUG/TRACE and generate huge amount of log entries while we step through the DG processing. The SDNC logging.cfg also creates the per DG files of data. I guess I can simply replace the file in dockerdata-nfs with the version I can use for support but it seems like we need a better solution that can fit both needs. Can't the logging.cfg support both the common onap logs and the SDNC specific DEBUG logging in the /opt/opendaylight/current/data/log directory ?
I am using release 1.1.0. It was working till Monday 4th Dec and then after that we clean up everything and redeploy the pods again to test something in my environment.
The after that SDC-be and SDC-fe never comes up. We tried this on 2-3 more setups but problem still persist.
I suspect that there is a problem in prepull_docker.sh script is not able to pull images which we currently required for SDC.
I am bringing up a clean release-1.1.0 environment to record an SDC video for another issue - so I will verify this again.
Anyway the healthcheck on the CD server is OK - the only difference is that the images are cached there right now - so on the off chance that the images were removed or not available via nexus3 - this will be seen on a clean EC2 server shortly. ( a real CD server that brings up a clean VM every time is in the works)
In master (I am also testing a patch) - I get the following (ignore aaf) in master
could be an image issue (different images in 1.1.0 and master) - or a config issue that has not been cherry picked to master yet (we are running the reverse), note portal depends on sdc - sdc is the issue
Make sure you use release-1.1.0 - as this is our stable branch right now
See separate mail on onap-discuss - we are stabilizing master - doing the last of Alexis de Talhouët cherry picks from stable release-1.1.0 - then SDC and AAI should come up
I recommend running a full set of pods in release-1.1.0 for now - you can also assist in testing master once the merges are in so we can declare it open for pending feature commits
Atul hi, thanks for the effort helping us stablilize - Alexis de Talhouët and the AAI team have fixed the 2 aai-service and aai-traversal issue that popup up 10am friday on release-1.1.0 - you can use that branch again.
Are you going to clean and rebuild release 1.1.0 for prepull_docker images?
Is there any alternative to proceed ?
I have again tried release 1.1.0 today in order to up my all ONAP components especially (AAI and SDC as well).But i am facing the same issue. My SDC component is not going to be up
There is no issue with the prepull - it is just a script that greps the docker image tags for all values.yaml - v1.1.0 in most cases.
If you run cd.sh at the top of the page - it will clean your environment and upgrade it - or checkout the commands it you want to do it yourself. There is no issue with the release-1.1.0 branch (besides a single not-required aaf container) - the
release-1.1.0 is stable as of 20171208:2300 EDT
As a check can you cover off each of the steps if you don't use the automated deploy script
(delete all pods, delete your config pod, remove dockerdata-nfs, source setenv.sh (make sure your onap-parameters.yaml is ok), create config, wait for it, (prepull is optional - it just speeds things up) , create pods, run healthcheck, PUT cloud-region to AAI ...
Remember we have not had an answer yet on your config - sdc will not come up unless dependent pods are up - for example - just try to run everything to start - then fine tune a subtree of pods later.
please try the following script - it is running on the hourly CD server and 3-4 other environments OK
Hi. now,I try to deploy onap on aws with using kubernetes. then,is it able to install onap component to separated VM? for example, aaf's one pod install to a 64gvm, then install another aaf's pod to 32g VM.
and another question,namespace in kubernetes equall VM in HEAT? like aaf vm,aai vm..in diagram.
Yes it is possible to run as many hosts as you like - this is the recommendation for a scalable/resilient system - there is a link to the SDNC initiative above - essentially you need to share the /dockerdata-nfs directory.
For your question about affinity - yes you can assign pods to a specific host - but kubernetes will distribute the load automatically and handle any failures for you - but if you want to change this you can edit the yaml either on the checked out repo - or live in the Kubernetes console.
There is the global namespace example "onap" then the pod/component namespace "aai, aaf" - they combine as onap-aai - so the closest the HEAT VM model would be to equate the pod namespace - however a pod like onap-aai could have HA containers where individual containers like aai-resources have 2 copies split across hosts - also parts of a pod could be split like aai-resources on one host and aai-service on another. the global namespace allows you to bring up several deployments of ONAP on the same kubernetes cluster - separated by namespace prefix and port assignment (300xx, 310xxx for example)
I have installed ONAP on Kubernetes on a single host machine following the manual instructions
Now I am trying to run the vFW demo in my setup. I am facing an error when I am onboarding the vFW-vSINK VSP using the SDC portal. The error occurs during the asset creation process after the VSP is imported into the catalog. Here is the error, also attaching the screenshot
Error code SVC4614
Status code 400
invalid content Group type org.openecomp.groups.heat.HeatStack does not exist
To give a back ground of the processes followed:
I installed Kubernetes and Rancher. Kubernetes environment was created using Rancher portal and it showed healthy state.
onap_parameter.yaml file was edited according to my OpenStack setup running on a separate host.
Thanks for the information. Yes I am using release-1.1.0. In fact I re-created the PODS once again and the error got resolved. Now I have reached to a stage where I am able to create and distribute the vFW-vSINK services.
Alan, Hi, there are a couple components that fail healthcheck for up to 15 min after the readiness pod marks them as up - the liveness probe needs to be adjusted and the teams need to provide a better /healthcheck url
SDC healthchecks fail constantly. Even in the CI build history there is a failure in every build output I checked. Also this graph shows different results now:
Are you able to resolve the above usecaseui-gui api health check issue. Since i am facing the same issue , it would be great if you have any workaround on this issue
No use usecaseui-gui still fails even in the jenkins: http://jenkins.onap.info/job/oom-cd/2123/console. I have not reached to the point where I will need these failing services, maybe for most of the use cases they are not needed at all.
I was able to create/deploy the vFirewall package (packet generator, sinc and firewall vnf)on openstack cloud. But i couldnt able to login into any of vnf's vm.
After when i debug i see i didnt change the default public key with our local public key pair in the PACKET GENERATOR curl jason UI. Now i am deploying the VNF again (same Vfirewall Package) on the openstack cloud, thought of giving our local public key in both pg and sinc json api's.
I have queries for clarifications : - how can we create a VNF package manually/dynamically using SDC component (so that we have leverage of get into the VNF vm and access the capability of the same) - And I want to implement the Service Function chaining for the deployed Vfirewall, please do let me know how to proceed with that.
PS: I have installed/Deployed ONAP using rancher on kubernetes (on openstack cloud platform) without DACE component so i haven't had leverage of using the Closed Loop Automation.
Could you please let me know the significance of the CURL command as mentioned in the cd.sh ( the automated script )
The CURL query present in cd.sh ( the automated script to install ONAP pods ) is failing. It has three parameters :
1. json file ( not sure whether we are supposed to use the same file as specified by ONAP community or we need to fill in our openstack details ). I have tried both. 2. a certification file named aaiapisimpledemoopenecomporg_20171003.crt ( which has NOT been attached alongwith the cd.sh script or specified anywhere else ) 3. There is a änother header ( -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" ). If I use this header, the script is faling. I have removed this header, then PUT succeed but GET fails.
I am NOT sure of the significance of the below mentioned curl command in cd.sh file. I was just doing the vfirewall onboarding, that time I noticed that this CURL command is required.
Moreover, the robot scripts ( both ./demo-k8s.sh init_robot and ./demo-k8s.sh init ) are failing.
The init_robot is failing : though we have entered the test as password but the http is not taking it.
The init testcase is failing giving me 401 error for the authorization.
Could you please help! Thanks in advance!
cd.sh snippet :
echo "run partial vFW" echo "curl with aai cert to cloud-region PUT"
Hi, the curls are an AAI POST and GET on the cloud region - this is required as part of testing the vFW. For yourself it is optional until you need to test some use case like the vFirewall.
If your init is failing then your cloud region and tenant are not set - check that you can read them in postman before running robot init (init_robot is only so you can see failures on the included web server - this should pass)
Thank you so much for the instant response. Glad to notice that all the queries have been addressed. But, still I am facing some errors:
I have tried running those CURL queries ( which I have pasted above ) by putting the complete set of our openstack values. Below is the list for the same.
Since it has been redirect, just to use our openstack TENANT ID and let other other values be same. The CURL GET still shows the error.
When we added the resource-version ("resource-version":"1513077767531",) in the aai json file apart from tenant ID, then the CURL command was successful. We fetched the resource version using the CURL GET command.
But, I am sure that every person, needs to fill their OWN OPENSTACK details ( rather than using the default details as mentioned in the AAI json file ).
Reason being the init robot is still failing. And if the robot testcase has to pick our openstack details via onap-parameters.yaml file ( rather than the one's specified as defaults in the json file shared ) , then definitely in AAI json file, we should pass our openstack details only. Please advise!
2. Also, I think we need to create a separate region like ( RegionThree) etc with our system openstack details , to make new entries in AAI.
2. Also, as discussed, I have checked the integration robot file used by ONAP-robot, the AAI username and password was as mentioned below:
3. I can notice that AAI logs are not getting updated , when we are running these CURL queries that enter data into AAI. Could you please let me know how to enable AAI logs?
The last update I could notice is of 12th dec in my system for AAI logs. But, from past few days , we are constantly trying to run CURL queries to enter data into AAI.
I have logged in to the AAI-SERVICES container but no AAI logs can be seen. Screenshot attached for your reference.
4. Moreover, aai-services is not present in dockerdata-nfs folder. Not sure why? Other sub-modules are present though.
Hi, We appreciate your exercising of the system. You likely have run into a couple issues we currently have with SDC healthcheck and Kubernetes liveness in general. Please continue to raise any jiras on issues you encounter bringing up and running ONAP in general. SDC is currently the component with the least accurate healthcheck in Kubernetes or Heat.
Currently SDC passes healthcheck about 74% of the time - if we wait about 8 min after the readiness probe declares all the containers as ready 1/1. The issue with SDC (26%), SDNC(8%), APPC (1%) in general is that their exposed healthcheck urls do not always report the system up at the appropriate time.
The workaround is to delay healthcheck for now until the containers have run for a bit - 5-10 min - which is a normal warming of the system and caches in a production system.
On the CD system, SDC comes up eventually 2/3 of the time - our issue is helping OOM and the component teams adjust the healthcheck endpoints to report proper liveness (not just 200 or a subset of rest functionality) - You both are welcome to help us with these and any other of our outstanding issues - we are expanding the team.
OOM SDC healthcheck failure 26% of the time even with 3 runs and 8 min wait state
In my case, the SDC never passed health checks even after waiting a couple of hours after everything is "Running" in kubectl. They passed health checks only after I restarted SDC. Which JIRA issue do you think this info is applicable to?
Gary Wu: For me, restarting SDC helped fix the Health-check. However when launching SDC UI, it failed to open (even though Health check was now passing).
For SDC-UI to work:
I had to restart ONAP (./deleteAll.bash-n onap; ./createAll.bash-n onap)
Made sure that SDC health check works after ONAP restart (with wait time ~ 10 min after containers start).
For this, I had to fix /etc/hosts in vnc-portal to change the SDC IP addresses since they change once you restart SDC.
However, I think I'm going to just re-deploy the entire ONAP until SDC passes the health check since I don't know what other things become out-of-date if SDC is restarted on by itself.
I also met the same SDC problem after deployed ONAP. The health check still did not pass even I restart sdc(./deleteAll.bash -n onap -a sdc and ./createAll.bash -n onap -a sdc) for 10 minutes. It seems all SDC components were running up except TITAN. I checked the log in container sdc-be: /var/lib/jetty/logs/SDC/SDC-BE/error.log.3, found Tian graph failed to initialize with an execption thrown com.thinkaurelius.titan.core.TitanException. Any sugguestion about this why Tian can not work?
{
"sdcVersion": "1.1.0",
"siteMode": "unknown",
"componentsInfo": [
{
"healthCheckComponent": "BE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
},
{
"healthCheckComponent": "TITAN",
"healthCheckStatus": "DOWN",
"description": "Titan graph is down"
},
{
"healthCheckComponent": "DE",
"healthCheckStatus": "UP",
"description": "OK"
},
{
"healthCheckComponent": "CASSANDRA",
"healthCheckStatus": "UP",
"description": "OK"
},
{
"healthCheckComponent": "ON_BOARDING",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK",
"componentsInfo": [
{
"healthCheckComponent": "ZU",
"healthCheckStatus": "UP",
"version": "0.2.0",
"description": "OK"
},
{
"healthCheckComponent": "BE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
},
{
"healthCheckComponent": "CAS",
"healthCheckStatus": "UP",
"version": "2.1.17",
"description": "OK"
},
{
"healthCheckComponent": "FE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
}
]
},
{
"healthCheckComponent": "FE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
}
]
2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<** createGraph started **> 2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<** open graph with /var/lib/jetty/config/catalog-be/titan.properties started> 2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<openGraph : try to load file /var/lib/jetty/config/catalog-be/titan.properties> 2018-01-08T09:59:10.719Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool> 2018-01-08T09:59:10.726Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc> 2018-01-08T09:59:15.580Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool> 2018-01-08T09:59:15.581Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc> 2018-01-08T09:59:16.467Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: 10.42.243.240> 2018-01-08T09:59:16.468Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<RemoveHost: sdc-cs.onap-sdc> 2018-01-08T09:59:23.938Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.t.g.c.GraphDatabaseConfiguration||ActivityType=<?>, Desc=<Set default timestamp provider MICRO> 2018-01-08T09:59:23.946Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.t.g.c.GraphDatabaseConfiguration||ActivityType=<?>, Desc=<Generated unique-instance-id=0a2a0d4d395-sdc-be-1187942207-21tfw1> 2018-01-08T09:59:23.956Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool> 2018-01-08T09:59:23.956Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc> 2018-01-08T09:59:24.052Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool> 2018-01-08T09:59:24.052Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc> 2018-01-08T09:59:24.153Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: 10.42.243.240> 2018-01-08T09:59:24.153Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<RemoveHost: sdc-cs.onap-sdc> 2018-01-08T09:59:24.164Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.titan.diskstorage.Backend||ActivityType=<?>, Desc=<Initiated backend operations thread pool of size 96> 2018-01-08T09:59:34.186Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<createGraph : failed to open Titan graph with configuration file: /var/lib/jetty/config/catalog-be/titan.properties> com.thinkaurelius.titan.core.TitanException: Could not initialize backend at com.thinkaurelius.titan.diskstorage.Backend.initialize(Backend.java:301) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1806) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:123) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94) ~[titan-core-1.0.0.jar:na] at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:62) ~[titan-core-1.0.0.jar:na] at org.openecomp.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:256) [catalog-dao-1.1.0.jar:na] at org.openecomp.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:207) [catalog-dao-1.1.0.jar:na] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_141] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_141] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141] at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141] at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:366) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:311) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:134) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:408) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1575) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:207) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1131) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1059) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:835) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:741) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:467) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1128) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1022) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:512) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
From what I have seen so far, health check seems to succeed immediately after containers are ready provided the worker node has enough CPU/Memory. In my case, the worker node had 48 vCPUs and 64GB RAM.
Syed Atif Husain: For PortalApps, looks like your system was unable to pull the image. One way to work around is to manually pull the image and also change the pullPolicy from Always to IfNotPresent (under $OOM_HOME/kubernetes/portal/values.yaml - see here).
For vnc-portal, the Pod would stay in 'PodInitializing' until the portalapps starts up, as it's defined as init-container dependency for vnc-portal (see here).
I needed to restart the sdnc dgbuilder container after loading DGs via the mulitple_dgload.sh and k8 started a new instance before I could do a docker start. What is the mechanism to restart a container to pick up a change made on persistant storage for the container ?
It's exactly a docker rm. With K8S you never stop start a container, you rm and re-create it (this is done automatically by K8S when a pod is deleted). So if the changed data is persisted, then it's ok to delete the pod, hence delete the container, because the new one will pick up the new data.
K8S deployment manifest defines the contract for the pod, which in the end is the container. Deleting the pod does delete the container, and kubernetes, based on the deployment manifest, will re-create it. Hope it clarifies things.
It does clarify things but we will have to make sure the things we did in Docker like edit a file inside the container and do a stop/start or restart can be done in K8. This is actually a problem in debugging where the project teams will have to make changes to support debugging in K8. We had setup shared data in the container configuration so that we can edit values and then delete the pod to pick up the new values. This will be a tedious pain.
At the end of the day, a docker stop docker start is just a lazy way to restart process(es) running within the container. If the proccess(es) to restart are not tied to the docker liveliness (e.g PID 1), then instead of stopping and starting the container, we could simply stop and start the process within the container. I'm not too scared about this being a pain to debug, but we will see I doubt I'm familliar enough with all of them (knowing they are around 80 containers as of today for the whole ONAP).
I think we need to add a volume link (-v in docker) for each app that we might need to modify configuration and do a restart - dgbuilder for instance has a script to bulk load DG's into the flows.json file but this file would be lost whenever the dgbuilder/node-red pod is restarted right now. This would not happen in regular docker on a stop/start or restart.
We need take a running instance of ONAP using OOM and change each application in some normal way and then restart to confirm that on a restart we aren't losing data. This is something we did in the HEAT/Docker/DockerCompose environment to make sure all the persistant storage settings were correct. Since k8 does a recreate instead of a restart we may lose file based configuration data. I would look a : add vFW netconf mount to APPC, add a flow to DG builder, create and distribute a model, instantiate a vFW , execute a closed loop policy on the vFW and vDNS ; then restart all containers and confirm that the data created is still there and the same control loops still run. I suspect right now with an OOM installation that parts might not survive a docker stop and K8 re-create of the container (since we cant do a docker start)
I'm new to Kubernates and to OOM but so the following question could have a obvious answer that I've completely missed.
Is there a reason not to use the following commands to expose the K8s containers so that you don't have to log on via the VNC sever which is just a pain.
Good question, I guess we live with port mapping requiring the vnc-portal so we can run multiple environments on the same host each with 30xxx, 31xxx etc.. but in reality most of us by default run one set of ONAP containers. Myself when I work in postman I use the 30xxx ports except for using the SDC gui - in the vnc-portal.
I think we need a JIRA to run ONAP in affective single port mapping config where 8989 for example maps to 8989 outside the namespace and not 30211 - for ease of development.
as a directory that is mapped from the host file system so that updates to the flows.json file in /opt/onap/sdnc/dgbuilder/releases/sndc1.0/flows/flows.json would persist across restarts/recreates of the container ?
alternatively is there a way to temporarily set the restart policy to never so that we can manually update flows.json and then restart the existing container ?
The name here has the be the same as the one specified above, it serves as ID to correlated the mounted folder.
The hostpath implies here that you have created on the host the folder /dockerdata-nfs/{{ .Values.nsPrefix }}/sdnc/dgbuilder/releases (where {{ .Values.nsPrefix }} is onap) and put the data you whish to persit in there.
caused a redeployment but dgbuilder didn't like the hostPath since files it was expecting aren't on the host until the dgbuilder image is pulled. Not sure if its a permissions problem on the host directories.
Should we be using something more like EmptyDir{} (but that doesn't seem to take a path) ?
Brian, I forget to mentioned the data has to be put in the persisted directory in the host first. Mounting the host directory will overwrite the directory in the container. So the first time, all the data is in the persisted directory (in the host). Then you start the pod, the persisted data will be mounted in the container. From there, you can either edit the persisted data from the server or from the pod itself.
Hi again, Very Good idea. A lot of the applications need a way to either expose config (log, db config) into the container or push data out (logs) to a NFS mapped share on the host. My current in-progress understanding of Kubernetes is that it wraps docker very closely and adds on top of docker where appropriate. Many of the docker commands exec, log, cp are the same as we have seen. For static persistent volumes there are already some defined in the yamls using volumeMounts: and volumes:. We also have dynamic volumes (specific to the undercloud VIM) in the SDNC clustering poc - https://gerrit.onap.org/r/#/c/25467/23. We still need places where volume mounts can be done to the same directory that already has an emptyDir stream into Filebeat (which has a volume under the covers) - see
For example the following has a patch that exposes a dir into the container just like a docker volume or a volume in docker-compose - the issue here is mixing emptyDir (exposing dirs between containers) and exposing dirs outside to the FS/NFS
I have used these existing volumes that expose the logback.xml file for example to move files into a container like the MSO app server in kubernetes from /dockerdata-nfs instead of using kubectl cp.
I myself will also look into PV's to replace the mounts in the ELK stack for the CD job - that is being migrated from docker-compose to Kubernetes and for the logging RI containers.
For the question about whether we can hold off on container restarts to be able to manually update a json exposed into the container. The model of Kubernetes auto-scaling is stateless. When I push pods without affinity rules - the containers randomly get assigned to any host and bringing down a container either manually or because of a health initiated trigger is usually out of the control of any OSS outside of Kubernetes - but there are callbacks. Rancher and Kubeadm for example are northbound to Kubernetes and act as VIM's and in the same way that a spot VM going down in EC2 gives a 2 min warning - I would expect we could register as listener to to at least a pre-stop of a container - even though it is a second or 2. I would also like to verify this and document all of this on our K8S devops page - all good questions that we need definitely need an answer for.
I was getting an error message since "-y" wasnt an allowed argument. Is cd.sh checked into onap.gerrit.org somewhere so we can reference that instead of the copy on the wiki ? Maybe I'm just looking in the wrong spot.
Brian, hi, you are using amsterdam - the change done by Munir has not been ported from master.
I retrofitted the CD script to fix the jenkins job and patched github to align with the new default prompt behaviour of deleteAll
yes, ideally all the scripts northbound of deleteAll should be in onap - I will move the cd.sh script into a ci/cd folder in OOM or in demo - as it clones oom inside.
Also, I'll put in an if statement on the delete special to amsterdam to not require the -y option
Actually I think this will be an issue for anyone master/amsterdam that has cloned before OOM-528 - essentially we need a migration plan
In my case I brought up an older image of master before the change - and the cd.sh script with the -y option fails (because it is not resilient ) on -y
root@ip-172-31-48-173:~# ./cd.sh -b master
Thu Jan 11 13:48:59 UTC 2018
provide onap-parameters.yaml and aai-cloud-region-put.json
vm.max_map_count = 262144
remove existing oom
Usage: oom/kubernetes/oneclick/deleteAll.bash [PARAMs]
-u : Display usage
-n [NAMESPACE] : Kubernetes namespace (required)
-a [APP] : Specify a specific ONAP component (default: all)
from the following choices:
sdc, aai ,mso, message-router, robot, vid, aaf, uui
sdnc, portal, policy, appc, multicloud, clamp, consul, vnfsdk
-N : Do not wait for deletion of namespace and its objects
Therefore unfortunately anyone on an older branch either needs to do a git pull or edit cd.sh one-time to remove the -y - after that you are ok and effectively upgraded to
OOM-528
-
Getting issue details...STATUS
I will add a migration line to the last onap-discuss on this
I am new to ONAP and yesterday I did setup ONAP on a permanent AWS m4 large instance which uses Dynamic public IP. Today, I removed existing ONAP environment and recreated new environment in Rancher. After adding the environment when I am trying to add host, rancher is not detecting new public IP. In the register command rancher is still referring to yesterday's public IP which is not valid.
Please let me know the steps required to restart ONAP on a Dynamic IP based server which needs to be shutdown and restarted on daily basis.
Hi, that is a common issue with Rancher - it needs a static IP or DNS name.
You have a couple workarounds, elastic IP, elastic IP + domain name, edit the host registration URL in rancher, or docker stop/rm rancher and rerun it
I opt for elastic IP + DNS entry - in my case I register onap.info in Route53, create an EIP in the EC2 console, then associate the EIP with the labelled instance ID network ID before bringing up rancher/kubernetes/helm.
This will also allow you to save the AMI and bring it up later with a 20 min delay until it is fully functional - provided you keep the EIP and domain A record.
this how the CD system works - see the following but do not touch anything it is used for deployment testing for the first 57 min of the hour. http://amsterdam.onap.info:8880/
Sorry I was answering your first question from memory this morning - didn't realize you added a 2nd comment with your workaround - yes that is OK but we agree - a lot of work. What you may do - and I will try is a very small static IP for the host a 4G machine that does not run the ONAP pods - they will all have affinity to a 2nd 64G host that has a dynamic IP - but the server must be static.
Another workaround that I have not tried is a automated host authentication via REST or CLI - this I need to research.
But still the easier way is to bring up the EC2 VM with an EIP (it will cost $2 per month when not used though) - You should have an allocation of 5 on your AWS account - I asked for 10.
We ran prepull_docker.sh on 4 different k8s nodes at the same time, we got 75,78,80 and 81 images (docker images | wc -l), we verified the pulling process using (ps -ef | grep docker | grep pull), all pulling processes were completed. Do you know why we got different number images?
Yes, weird - intermittent errors usually mean the underlying cloud provider, I sometimes get pull errors and even timeouts - used to get them on heat as well. There are issues with nexus3 servers periodically due to load, upgrades and I have heard about a serious regional issue with mirrors. I do not know the cloud provider that these servers run on - the issue may be there. The script is pretty simple - it greps all the values.yaml files for docker names and images - there were issues where it parsed incorrectly and tried to pull just the image name or just the image version - but these were fixed - hopefully no more issues with the sh script.
There also may be issues with docker itself with 80 parallel pulls - we likely should add a -serial flag - to pull in sequence - it would be less performant.
you can do the following on a clean system to see the parallel pulls in progress and/or count them
ps -ef | grep docker | grep pull | wc -l
In the end there should be no issues because anything not pulled in the prepull will just get pulled when the docker containers are run via kubectl - they will just start slower the first time.
please note that there are a couple "huge" images on the order of 1-2G one of them for SDNC - and i have lately seen issues bringing up SDNC on a clean system - required a ./deleteAll.bash -n onap -a sdnc and re ./createAll.
Another possibility is that docker is optimizing or rearranging the pulls and running into issues depending on the order.
Another issue is that the 4 different servers have different image sets - as the docker images | wc -l may be picking up server or client images only present on one or more of the nodes - if you look at a cluster of 4 servers - I have one - then the master has a lot more images than the other 4 and the other 3 clients usually run different combinations of the 6 kubernetes servers - for what reason I am still looking at - before you even bring up the onap containers.
lets watch this - there is enough writing here to raise a JIRA - which I will likely do.
Michael O'Brien - I am trying to bring up vid, robot, and aai w/ the latest oom, seeing this error on several aai pods:
Error: failed to start container "filebeat-onap-aai-resources": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/log/filebeat/logback/filebeat.yml\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/2234aef661aa61185f7fb8fd694ec59d29f82c2478d9de1beee0a282e4af4936\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/2234aef661aa61185f7fb8fd694ec59d29f82c2478d9de1beee0a282e4af4936/usr/share/filebeat/filebeat.yml\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
The config job seems to have failed with an error but it did create the files under /dockerdata-nfs/onap
Hi, good question and thank you for all your help with OOM code/config/reviews.
That particular error "not a directory" is a sort of red herring - it means 2 things, the container is not finished initializing (the PVs and volume mounts are not ready yet - it will go away after the pod tree is stable - or your config pod had an issue - not recoverable without a delete/purge. These errors occur on all pods for a while until the hierarchy of dependent pods are up and each one goes through the init cycle - however if you see these after the normal 7-15 min startup time and they do not pass config - then you likely have an issue with the config pod pushing all the /dockerdata-nfs files (this is being removed and refactored as we speak) - due to missing config in setenv.bash and onap-parameters.yaml (it must be copied to oom/kubernetes/config)
Also that many failures usually means a config pod issue - or a full HD or RAM issue (if you have over 80G HD (you need 100G over time) and you have over 51G ram - then it is a config pod issue.
How to avoid this. See the cd.sh script attached and linked to at the top of the page - this is used to provision a system automatically on the CD servers we run the hourly jenkins job on - the script can also be used by developers wishing a full refresh of their environment (delete, re-pull, config up, pods up, run healthcheck...)
If you are running the system manually - use the cd.sh script or the manual instructions at the top in detail - the usual config issue is forgetting to configure onap-parameters.yaml (you will know this by checking the config pod status). The second usual issue is failing to run setenv.sh to pickup the docker and other env variables - this will also fail the config container.
kubectl get pods --all-namespaces -a
it must say
onap config 0/1 Completed 0 1m
do the following to see any errors - usually a missing $variable set
kubectl -namespace onap logs -f config
as of an hour ago these were the failing components - no AAI, vid or robot
As an additional reference you can refer to the running master CD job - for the times when you might think it is actually failing - not just locally.
00:08:17Basic A&AI Health Check | PASS |
00:08:17------------------------------------------------------------------------------
00:08:18Basic VID Health Check | PASS |
Also AAI has not been failing healthcheck for at least the last 7 days - actually I think since the first week of Dec 2017 - once - it is one of the most stable ONAP components
Let me know if this fixes your issues - if your config pod is busted - then you will need to deleteAll pods, purge the config pod and rerun setenv, config pod and createAll - see the script for the exact details
Thanks Michael O'Brien, I needed to refresh the config pod and once i got "completed" I was able to get aai and several others going! Thanks for your help!
This is a pretty basic question. I've been having some trouble with getting SDNC running (still troubleshooting) but as then looking at the readiness docker image and understanding how it worked.
I think I understood most of it but I couldn't figure out how the value of "K8S_CONFIG_B64" environment variable was been set as the seems to be some "magic" for this and I was hoping somebody could give me a hint.
Andrew, hi, just to cover off SDNC - since clustering was put in - the images have increased in number and size - there may be a timeout issue. So on a completely clean VM you may need to delete and create -a sdnc to get around this issue that only appears on slow machines (those with less than 16 cores)
Last December (2017) I managed to deploy an almost-amsterdam version of ONAP using oom on a single Ubuntu VM. I used a manual list of commands (cd.sh was not available at the time) as explained on this page. The installation used:
Docker 1.12, Rancher server 1.6.10, Kubernetes 1.8.6, Helm 2.3.0
Most container came up. Over time (weeks) things degraded.
Back from the holidays I tried to reinstall (this time I'm aiming for the amsterdam branch) from scratch and had issue with Rancher.
To remove the possibility that my host was corrupted in some way, today I used a brand new Ubuntu 16.04.4 VM I tried to create the same environment for ONAP. I executed the commands in oom_rancher_setup_1.sh. I executed these by hand so that I can better control the docker installation and the usermod command.
I ended up with the same problem I had on my old VM, yesterday.
The problem is has follow: In the Rancher Environment GUI I created a Kubernetes environment. Once I made it the default the State became "Unhealthy". Rancher won't tell you why!
Then I tried anyway to add a host. When running the command:
The agent started to complain that it could not connect to the server. SSL certification is failing.
I get an output like this:
Unable to find image 'rancher/agent:v1.2.6' locally
v1.2.6: Pulling from rancher/agent
b3e1c725a85f: Pull complete
6a710864a9fc: Pull complete
d0ac3b234321: Pull complete
87f567b5cf58: Pull complete
063e24b217c4: Pull complete
d0a3f58caef0: Pull complete
16914729cfd3: Pull complete
2ce3828c0b9d: Pull complete
7df47a98fc4b: Pull complete
Digest: sha256:a68afd351c7417e6d66a77e97144113ceb7a9c3cdd46fb6e1fd5f5a5a33111cd
Status: Downloaded newer image for rancher/agent:v1.2.6
INFO: Running Agent Registration Process, CATTLE_URL=https://10.182.40.40:8880/v1
INFO: Attempting to connect to: https://10.182.40.40:8880/v1
ERROR: https://10.182.40.40:8880/v1 is not accessible (server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none)
ERROR: https://10.182.40.40:8880/v1 is not accessible (server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none)
ERROR: https://10.182.40.40:8880/v1 is not accessible (server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none)
ERROR: https://10.182.40.40:8880/v1 is not accessible (server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none)
^C
The Unhealthy state might be due to the web client having the same communication issue.
This does not appear to be an ONAP specific issue, since I'm failing in one of the first installation step which is to get a Rancher server and agent working together.
This behavior was only observed upon my return on January 9th. In December I had no such issue.
Could a certificate be expired? Where are these certificates? (In the docker images I suspect)
Hi, welcome. Also very detailed and complete environment description - appreciated.
I am extremely busy still - but your post stood out. I will return in more detail on the weekend.
For now, yes I also have had issues connecting the client - usually this involved a non static IP. for example if I saved an AMI on AWS and got a different EIP. There are several fixes for that one - use a static EIP and/or assign a domain name to it. Also you can retrofit your server - I turned off security on the CD poc for a couple days
Update: I reproduced the same SSL issues using a small vagrant VM (2 CPU, 2GB). The VagrantFile uses: config.vm.box = "ubuntu/xenial64"
From this VM I ran the following commands:
sudo curl https://releases.rancher.com/install-docker/1.12.sh | sh
sudo docker run -d --restart=unless-stopped -p 8880:8080 rancher/server:v1.6.10
# From the Rancher web-ui activated a Kubernetes environment
# then got (and exec) the following command to add a host
sudo docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.6 https://192.168.16.61:8880/v1/scripts/0D95310D5AF5AC047A37:1514678400000:f4CQEfzqgONjYc3vZlq6K9MbTA
I also tried rancher server v1.6.11. Same issues were seen.
Could you get through the issue? I have also manually installed the components, but unable to get ONAP up running. It would be helpful if you can list the steps taken to install and run onap.
I could post my notes. They would look like a summary of information already on this page.
If some think it would be useful, I could do so.
In order to avoid too much redundancy on this page, could you tell us a bit more about where you have issues. Then maybe I could post a subset of my notes around this area.
Basically I see this installation being made of 2 major steps:
Install infrastructure: Docker 1.12, Rancher server 1.6.10, Kubernetes 1.8.6, Helm 2.3.0. After this step you should be able to go to the Rancher Web UI and see the rancher/kubernetes dockers instances and pod running. This means running the oom_rancher_setup_1.sh, which in my case I ran manually. Followed by some interaction in Rancher's web UI to create a k8s env, and add a host.
You can find more supporting debugging for the same SDNC SLI-API in the attached document.
After running the installSdncDb.sh script , and after logging into the SDNC container and after logging into the SDNC database, we found that the "VLAN_ID_POOL" table does not exists, though the database was showing that the mentioned table exists. It was present in stale format.
Rahul Sharma I followed the steps on the link above but I am facing issues related to connectivity to Openstack. I guess I am missing some basic setup in my openstack.
I have created a network and subnet on openstack. I am using there ids in the param file for OPENSTACK_OAM_NETWORK_ID and OPENSTACK_OAM_SUBNET_ID respectively. What should I use for OPENSTACK_PUBLIC_NET_ID? Do I have to create another network? How do I ensure my ONAP VM is able to connect to the Openstack VM? [I have installed ONAP OOM on one Azure VM and Openstack on another VM].
Syed Atif Husain: OPENSTACK_PUBLIC_NET_ID should be one of the networks on your Openstack that's publicly accessible. One of the public IP assigned to your vFW_x_VNF (x = SINC or PG) would belong to this network.
You don't need to create other networks: unprotected_private_net_id (zdfw1fwl01_unprotected), unprotected_private_subnet_id(zdfw1fwl01_unprotected_sub), protected_private_net_id(zdfw1fwl01_protected), protected_private_subnet_id(zdfw1fwl01_protected_sub) would be created as part of vFW_SINC stack deployment.
The "pub_key" attribute will be used to communicate with the VM on Openstack.
Note: the values sent in the SDNC-Preload step are used to create the stack; so if you want to update something, you can do it then.
Also, when I tested, my ONAP was running on Openstack; running ONAP on Azure should be similar considering that MultiVIM should take care of different platforms underneath but you can verify in that area. Have a look at the VF instantiation flow for Release 1.1 here
When I run cd.sh, the config pod isnt coming up. It's shown to be in error state. Does anyone know why this happens? In the kubectl logs, I see the following error 'DEPLOY_DCAE" must be set in onap-parameters.yaml.
You need to give dcae related params in onap-paramters.yaml file. Otherwise remove dcae component from HELM_APPS in oom/kubernetes/oneclick/setenv.bash if you dont want to install dcae or if your openstack setup is not ready
Refer manual instructions under the section 'quickstart installation'
I won't have time until later today to check - but if the config container complains about a missing DCAE variable - then there is a chance the config yaml is missing it
-----Original Message----- From: Michael O'Brien Sent: Tuesday, January 23, 2018 07:04 To: 'Pavan Gupta' <pavan.gupta@calsoftinc.com> Subject: RE: Issues with cd.sh sciprt
Pavan,
Hi, the script mirrors the manual instructions and runs ok on several servers including the automated CD server.
You place the 2 aai files, the onap-configuration.yaml file beside the cd.sh script and run it (this assumes you have run the rancher config ok)
I would need the error conditions pasted to determine if you missed a step - likely during the config pod bootstrap - could you post the errors on the config pod you see.
Also verify all versions and prerequisites, Rancher 1.6.10, helm 2.3.x, docker 1.12.x, Kubernetes 1.8.x
Try to come to the OOM meeting and/or raise a JIRA and we can look at it from there.
DCAE is in flux but there should be no issues with the 2.0.0 tag for the config container
I have posted this query on the wiki page as well. I could get the installation script working and moved onto running cd.sh. Config pod is shown in error state. I looked at Kubenetes log and it says DEPLOY_DCAE should be set in snap-parameters.yaml file. I tried setting this parameter, but the error still continues. Any idea, what’s going wrong or needs to be done to resolve this issue?
I have setup onap via OOM via Rancher on VMware Workstation 14 and VMware Fusion 8 with no issues
The config in onap-parameters.yaml must point to an openstack user/pass/tenant so that you can create a customer/tenant/region in AAI as part of the vFW use case. You can use any openstack or Rackspace config - you only need keystone to work until you get to SO instantiation.
In the future we will be able to configure Azure or AWS credentials via work being done in the Multicloud repo.
Hi, I got to the point of getting a VNF deployed using the kubernates deployment so just wanted to let you know it can work in different environments.
I'm using Rancher and a host VM on a private Red Hat OpenStack.
a couple of local workarounds but and I had to redeploy AAI as it didn't come up first time.
However SDNC didn't work and I had to change it from using the NFS server to using the kubernates volumes as I was getting the error in the nfs-provisioner-.... pod refering to all the ports but I think I have them all open etc.
Why is volume handling for SDNC different to the other namespaces ?
Volume handling for SDNC is done differently for 2 reasons:
To support scaling SDNC (ODL-MDSAL and MySQL): For dynamically creating persistent-volumes when scaling MySQL pods, we need Storage classes. And to support Kubernetes deployed on local VMs, 'nfs' based provisioner was one of the available option.
To make sure that volumes are persisted after Pod restart; hence cannot use pod's empty directory.
Not sure why nfs-provisioner isn't starting for you when you have the ports open?
We created a service and distributed using SDC UI. As per the SDC video, the service should be distributed in AAI, VID and MSO. List:
Vendor name : MyVendor
License agreement : MyLicenseAgreement
Entitlementpool : MyEntitlementPool
Service : vFW-vSINK-service
VSP : vFW-vSINK
2. After running the init robot testcase, we can notice that only the default services are being listed. The service , which we created using SDC, is not visible in AAI.
3. The curl queries for SDC are not working. We tried many curl queries for the same, to fetch the service name/instance.
Pavan, Hi, that ubuntu 14 version is a left over from the original heat parameters - it was used to spin up VM's (the original 1.0 heat install had a mix of 14/16 VMs - don't know why we don't also list the 16 version - you can ignore it as we are only using docker containers in Kubernetes right now.
After the installation, I tried http://10.22.4.112:30211 on the browser and the ONAP portal didn't open up. Not all services are shown 1/1 (please check the output below)
I am not sure, why can't I see the onap portal now.
FOllowing is the error msg on Kubernetes. Its not able to pull the container image.
Failed to pull image "nexus3.onap.org:10001/onap/vfc/ztevnfmdriver:v1.0.2": rpc error: code = 2 desc = Error: image onap/vfc/ztevnfmdriver:v1.0.2 not found Error syncing pod
Check for oom/kubernetes/portal/values.yaml file in the respective ONAP component ( sayvfc or portal or MSO etc ) and look for the prepull policy option.
1/25/2018 11:06:59 AM2018-01-25 19:06:59.777976 I | Using https://kubernetes.default.svc.cluster.local:443 for kubernetes master 1/25/2018 11:06:59 AM2018-01-25 19:06:59.805097 I | Could not connect to Kube Masterthe server has asked for the client to provide credentials
Has anyone seen this issue or know how to solve it?
I guess this would be on Amsterdam. You need to update the kube2msb deployment file with your K8S token. In Rancher, under your environment, go in Kubernetes → CLI → Generate Config this should gives you your token to authenticate to K8S API for your deployment.
stored passwd in file: /.password2 /usr/lib/python2.7/dist-packages/supervisor/options.py:297: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security. 'Supervisord is running as root and it is searching ' 2018-01-25 21:47:52,310 CRIT Supervisor running as root (no user in config file) 2018-01-25 21:47:52,310 WARN Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing 2018-01-25 21:47:52,354 INFO RPC interface 'supervisor' initialized 2018-01-25 21:47:52,357 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2018-01-25 21:47:52,357 INFO supervisord started with pid 44 2018-01-25 21:47:53,361 INFO spawned: 'xvfb' with pid 51 2018-01-25 21:47:53,363 INFO spawned: 'pcmanfm' with pid 52 2018-01-25 21:47:53,365 INFO spawned: 'lxpanel' with pid 53 2018-01-25 21:47:53,368 INFO spawned: 'lxsession' with pid 54 2018-01-25 21:47:53,371 INFO spawned: 'x11vnc' with pid 55 2018-01-25 21:47:53,373 INFO spawned: 'novnc' with pid 56 2018-01-25 21:47:53,406 INFO exited: x11vnc (exit status 1; not expected) 2018-01-25 21:47:54,681 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:54,681 INFO success: pcmanfm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:54,681 INFO success: lxpanel entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:54,681 INFO success: lxsession entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:54,683 INFO spawned: 'x11vnc' with pid 68 2018-01-25 21:47:54,683 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2018-01-25 21:47:56,638 INFO success: x11vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
The ONAP system/pods enter into the CrashLoopBackOff state, only when you delete the dockerdata-nfs for the respective ONAP component.
rm rf /dockerdatanfs/portal has been deleted. Now, ONAP system has noways of knowing - which data to delete, so there are uncleaned/dangling links.
Solution :
If you have kept the backup of dockerdata-nfs folder ( either complete folder or for portal ) , then put it back. Onap pods will take the data for portal from dockerdata-nfs and then delete the onap-portal pod. Then create the onap-portal pod again.
For the vnc-portal, I have faced the similar issue today:
run the command : kubectl describe po/<container-for-vnc-portal> n onapportal
Look for the docker image, it has dependency on - I think it's mariadb or some other docker image.
run the command : docker images | grep <image found in step 2> Note : the respective image will be missing.
Pull the respective docker image ( as found in step 2) docker pull <image name> Kubernetes will pick the newly pulled docker image. The issue for vnc-portal will be resolved.
Guys, it helps if you post your versions (onap branch, helm version, kubernetes version, rancher version, docker version), whether your config container ran ok 0/1 completed and that you have all dependent containers up (for example vnc-portal needs vid to start)
common issue is helm related (helm 2.5+ running on amsterdam - stick to 2.3 on that branch)
When you say helm 2.5+ are you referring to server version or client ? I only installed helm client v2.1.3 and I think rancher installs the helm server.
onap I am using is amsterdam
All the pods are up and running except for vnc-portal container in onap-portal namespace and elasticsearch container in onap-log
I followed the instructions specified in the below post by kranthi to solve the problem.
NOTE: Main reason for this issue is I did not have the recommended versions of helm/rancher & kubernetes. It was not so easy to align the versions so tried the below suggested fix and it worked for me. You can also try it and see if it solves your issue.
I had the same problem with Amsterdam branch. Master branch has fixes to resolve this. Basically the helm chart they defined lifecycle PostStart which may run before starting container itself (Its not guaranteed). So, please take the portal folder from master branch and replace in Amsterdam or just replace resources folder in side portal (from master) and also portal-vnc-dep.yaml file inside template from master to Amsterdam
Guys, follow or use as a reference the scripts below - it will create a rancher environment and install onap on either amsterdam or master (use your own onap-parameters.yaml)
Sorry for the trouble. I am a beginner to ONAP. I wanted to install ONAP on AWS environment. But as I went through your video I found I need onap_paramaters.yaml file which includes the Openstack credentials. Do I need this for installing ONAP on AWS environment. I want to install Onap on AWS instance only.
Is it optional or I must have Openstack Credentials
Hi, no, you can put fake user/pass/token strings there for now. When you get the point of running the use cases - like the vFW and need to create a customer/tenent/region on AAI - this is where real credentials will be required to authenticate to Keystone. Later when you orchestrate VNFs via SO - full functionality will be required.
For now use the sample one in the repo.
let us know how things work out. And don't hesitate to ask questions about AWS in your case when bringing up the system.
Before I start installing Onap, can you please help me understand the need of domain name for installation.
Can't I use Elastic IP only?
And about the use case, Can you let me know which use cases will work under this Installation of ONAP on Kubernetes with out having Openstack Credentials.
root@ip-10-0-1-113:~# ./cd.sh -b release-1.1.0 Wed Jan 31 06:53:31 UTC 2018 provide onap-parameters.yaml and aai-cloud-region-put.json vm.max_map_count = 262144 remove existing oom ./cd.sh: line 20: oom/kubernetes/oneclick/setenv.bash: No such file or directory ./cd.sh: line 22: oom/kubernetes/oneclick/deleteAll.bash: No such file or directory Error: incompatible versions client[v2.8.0] server[v2.6.1] sleeping 1 min deleting /dockerdata-nfs chmod: cannot access '/dockerdata-nfs/onap': No such file or directory pull new oom Cloning into 'oom'... fatal: Remote branch release-1.1.0 not found in upstream origin start config pod ./cd.sh: line 43: oom/kubernetes/oneclick/setenv.bash: No such file or directory moving onap-parameters.yaml to oom/kubernetes/config cp: cannot create regular file 'oom/kubernetes/config': No such file or directory ./cd.sh: line 47: cd: oom/kubernetes/config: No such file or directory ./cd.sh: line 48: ./createConfig.sh: No such file or directory verify onap-config is 0/1 not 1/1 - as in completed - an error pod - means you are missing onap-parameters.yaml or values are not set in it. No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete No resources found. waiting for config pod to complete....
fatal: Remote branch release-1.1.0 not found in upstream origin
release-1.1.0 was deleted a month ago - yes I had a comment in my cd.sh script as an example for master or that release - I will update the comment to print "amsterdam" - so there is no confusion
Check your cd script output.rtf - you are not running the correct helm version (likely you are running 2.3 - should be running 2.6+ - ideally 2.8.0)
For the vnf image pull - have not looked at this - verify the right tag is being pulled from nexus3 and close off the JIRA if you find it.
If you look at your logs - you will see you have the right # of non-running containers (2) but you will notice that some of your createAll calls are failing on the new template tpl code added last week (yes the author of that change should have notified the community of the pending change - I picked up the comm task later that day).
like the following
Error: parse error in "appc/templates/appc-conf-configmap.yaml": template: appc/templates/appc-conf-configmap.yaml:8: function "tpl" not defined
The command helm returned with error code 1
Check this page for the right version - it changed on Wed.
I've attached the helm files I made for this workaround if you just expand them into ..../oom/kubernates you should get a directory called ves and then you can just go ../oneclick/createall.sh -n onap -a ves
Hi Andrew Fenner, it's nice to see that it works for you. I have OOM setup with out DCAE. Now I can download the ves-oom.tar and create the pod? How can I make other components point to this standalone DCAE model? we have to change vFWCL.zip to give DCAE collector ip and port right? Can you give more details on Closed Loop end?
The file is attached in the last post. The VES and CDAP are intergrated into the rest of the other components by the k8s dns. The way to expose the VES port is using
When we were doing SDNC preload operation, for SINK and PG, we noticed for the modified json files for SINK ( our values of VNF details and service instance etc), the existing/predefined VFWCL instance got changed? Was it correct?
Image pull errors usually mean you cannot reach nexus3.onap.org - especially that many - which could be your proxy (switch to a cell connection to verify).
Do a manual docker pull to check this.
Another reason could be you did not source setenv.bach where the docker repo credentials/url are set
Remember this is Kubernetes not Docker. Kubernetes is a layer on top of Docker - you don't need to run any docker commands except when installing the Rancher wrapper on Kubernetes - after that always use kubectl
Follow the instructions on this wiki "exactly" or use the scripts for your first time install
Pulling docker images yourself is not required - the only reason for the prepull is to speed up the onap startup - for example running the createAll a second time will run faster since the images were pulled earlier.
The images that the values.yaml (s) files pull are the ones pulled automatically by Kubernetes - you don't need later versions unless there are app fixes we have not switched to yet.
If you are having issues with docker pulls then it is in your system behind your firewall - I can't remember if it was you (I answer a lot of support questions here) - did you do a proper source of setenv.sh and also make sure your config pod is OK.
If you really want to see ONAP work usually OK - just to verify your procedure - run it on a VM in public cloud like AWS or Azure and apply that to your local environment. I am thinking that there may be an issue pulling from nexus3 - I have seen this in other corp environments.
I follow instruction above to run ONAP on Kubernetes, where the server and client are co-located.
I have two issues regarding the implementation:
When i checking pods by kubectl get pods --all-namespaces -a | grep 2/2 comment, i receiving following information, which the portal and policy are not listed.
2. in the next step, i just followed the VNC-portal through the Video but the pod portal is not available there too. In principle, i tried to add the portal but an error is comes up that "the portal is already exist". in addition i looking for the ete-k8s.sh file in the dockerdata-nfs but there is no any files except eteshare and robot!
For 1. Yes,policy and portal should come in the above 'kubectl' result. I would recommend checking your setenv.bash under $HOME/oom/kubernetes/oneclick and check which HELM_APPS you are deploying. Make sure it has policy and portal in there.
For 2. ete-k8s.sh is present under $HOME/oom/kubernetes/robot, not under dockerdata-nfs. eteshare under dockerdata-nfs/onap/robot would contain the logs of the run when you execute ete-k8s.sh.
Regarding to first issue: Policy and Portal are there.
Regarding to the second issue: i just followed instruction of the VNC-portal. The video shows that ete-k8s.sh must appear in the dockerdata-nfs when running ./createAll.bash -n demo
because of the portal, i can not check AAI endpoints and run health check!
I think mistakenly i have created to instances. One based on instruction provided in ONAP on Kubernetes (onap) and the second one based on vnc-portal instruction (demo). Should i delete one of the instances, for example demo? if yes please tell me what command i should use!
if i delete one instance, Does it effect on the other one?
when i ran kubectl get pods -n onap-portal for onap i receive following messages:
root@omap:~/oom/kubernetes/robot# kubectl get pods -n onap-portal NAME READY STATUS RESTARTS AGE portalapps-dd4f99c9b-lbm7w 0/2 Init:Error 0 24m portaldb-7f8547d599-f2wlv 0/1 CrashLoopBackOff 5 24m portalwidgets-6f884fd4b4-wl84p 0/1 Init:Error 0 24m vnc-portal-687cdf7845-clqth 0/1 Init:0/4 1 24m
But for demo is:
root@omap:~/oom/kubernetes/robot# kubectl get pods -n demo-portal No resources found.
in other case, when i run the health check (as you mentioned), i receive the following message:
root@omap:~/oom/kubernetes/robot# ./ete-k8s.sh health No resources found. error: expected 'exec POD_NAME COMMAND [ARG1] [ARG2] ... [ARGN]'. POD_NAME and COMMAND are required arguments for the exec command See 'kubectl exec -h' for help and examples.
I am not sure about the demo-portal. But yes, if the ports are already being used, there would be conflicts when launching similar pod again.
I would recommend clearing up and starting afresh.
Here is what I would do:
Delete the onap containers. Basically follow the steps here.
Before you restart again, execute: kubectl get pods --all-namespaces -a to make sure that none of onap containers are running. Also check if there are any 'demo' portal pods are running. You should only see Kubernetes specific pods.
Once clean, run createConfig and then createAll for onap deployment.
********** Cleaning up ONAP: release "demo-consul" deleted namespace "demo-consul" deleted clusterrolebinding "demo-consul-admin-binding" deleted Service account demo-consul-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-msb" deleted clusterrolebinding "demo-msb-admin-binding" deleted Service account demo-msb-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-mso" deleted clusterrolebinding "demo-mso-admin-binding" deleted Service account demo-mso-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-message-router" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-message-router-admin-binding" not found Service account demo-message-router-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-sdnc" deleted clusterrolebinding "demo-sdnc-admin-binding" deleted Service account demo-sdnc-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-vid" deleted clusterrolebinding "demo-vid-admin-binding" deleted Service account demo-vid-admin-binding deleted.
E0201 09:24:42.090532 5895 portforward.go:331] an error occurred forwarding 32898 -> 44134: error forwarding port 44134 to pod 9b031662eac045462b5e018cc6829467a799568021c3a97dfe8d7ec6272e1064, uid : exit status 1: 2018/02/01 09:24:42 socat[7805] E connect(6, AF=2 127.0.0.1:44134, 16): Connection refused Error: transport is closing namespace "demo-portal" deleted clusterrolebinding "demo-portal-admin-binding" deleted Service account demo-portal-admin-binding deleted.
Error: release: "demo-policy" not found namespace "demo-policy" deleted clusterrolebinding "demo-policy-admin-binding" deleted Service account demo-policy-admin-binding deleted.
Error: release: "demo-appc" not found Error from server (NotFound): namespaces "demo-appc" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-appc-admin-binding" not found Service account demo-appc-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo-sdc" deleted clusterrolebinding "demo-sdc-admin-binding" deleted Service account demo-sdc-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-dcaegen2" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-dcaegen2-admin-binding" not found Service account demo-dcaegen2-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-log" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-log-admin-binding" not found Service account demo-log-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-cli" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-cli-admin-binding" not found Service account demo-cli-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-multicloud" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-multicloud-admin-binding" not found Service account demo-multicloud-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-clamp" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-clamp-admin-binding" not found Service account demo-clamp-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-vnfsdk" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-vnfsdk-admin-binding" not found Service account demo-vnfsdk-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-uui" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-uui-admin-binding" not found Service account demo-uui-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-aaf" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-aaf-admin-binding" not found Service account demo-aaf-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-vfc" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-vfc-admin-binding" not found Service account demo-vfc-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-kube2msb" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-kube2msb-admin-binding" not found Service account demo-kube2msb-admin-binding deleted.
Error: could not find a ready tiller pod Error from server (NotFound): namespaces "demo-esr" not found Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-esr-admin-binding" not found Service account demo-esr-admin-binding deleted.
Error: could not find a ready tiller pod namespace "demo" deleted Waiting for namespaces termination...
Apart of that i try to delete the demo and onap but i am not succeed again.
Here is the error for the second command (./deleteAll.bash -n onap):
root@omap:~/oom/kubernetes/oneclick# ./deleteAll.bash -n demo Current kubectl context does not match context specified: ONAP You are about to delete deployment from: ONAP To continue enter context name: demo Your response does not match current context! Skipping delete ... root@omap:~/oom/kubernetes/oneclick#
Some of the earlier errors are normal - I have seen these on half-deployed systems
if the following shows pods still up (except the 6 for kubernetes) even after a helm delete --purge - then you could also start from scratch - delete all of your kubernetes and rancher docker containers
also try to follow the tutorial here "exactly" if this is your first time running onap - or use the included scripts - you won't have any issues that way.
Also just to be safe - because there may be some hardcoding of "onap" - it was hardcoded in places under helm 2.3 because we could not use the tpl template until 2.6 (we only upgraded to 2.8 last week)
I am totally new on ONAP. I exactly followed as the tutorial, but once i tried to add vnc-portal, the errors are come up. Because in instruction of the vnc-portal mentioned that need to create a demo for the portal which make a conflict with the onap (it seems that running two instances are complicated!)
As you suggested i deleted the Pods, but one of them still is in terminating state, should i ignore that or i should start from scratch?
root@omap:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a NAMESPACE NAME READY STATUS RESTARTS AGE demo-sdnc sdnc-dbhost-0 0/2 Terminating 1 2d kube-system heapster-76b8cd7b5-z99xr 1/1 Running 0 3d kube-system kube-dns-5d7b4487c9-zc5tx 3/3 Running 735 3d kube-system kubernetes-dashboard-f9577fffd-c8bgs 1/1 Running 0 3d kube-system monitoring-grafana-997796fcf-mgqd9 1/1 Running 0 3d kube-system monitoring-influxdb-56fdcd96b-pnbrj 1/1 Running 0 3d kube-system tiller-deploy-74f6f6c747-7cvth 1/1 Running 373 3d
Eveything is normal except for the failed SDNC container deletion - I have seen this on another system - 2 days ago - something went into master for SDNC that caused this - for that particular machine deleted the VM and raised a new spot VM - a helm delete --purge had no effect - even killing the docker outside of kubernetes had no effect - I had notes on this and will raise a JIRA - the next system I raised for the CD jobs dis not have the issue anymore.
Hi, the DNS name rackspace.onap.info is my own domain - it is just an example - I use a domain to avoid an IP address. In your case you will need to use the IP address of your VM to launch the UI and register a host and not the IP/DNS of my host.
If I still had that system up - then you would have actually registered your host to one of my own OOM deployments and my pods would have started appearing on your system - impossible because our two deployments use different generated client tokens anyway.
when i used following command to add portal to ONAP (./createAll.bash -n onap -a robot) i received CashLoopBackOff in portal pods and all the portal pods stays in init state for a long time:
root@omap:~/oom/kubernetes/config# kubectl get pods -n onap-portal NAME READY STATUS RESTARTS AGE portalapps-dd4f99c9b-t8hwz 0/2 Init:0/3 1 17m portaldb-7f8547d599-ppjmx 0/1 CrashLoopBackOff 7 17m portalwidgets-6f884fd4b4-w2pc7 0/1 Init:0/1 1 17m vnc-portal-687cdf7845-95bq7 0/1 Init:0/4 1 17m
can you tell me what is this for and how can i solve these two problems?
Looks like you are mixing namespaces and pods - you have 2 namespaces (in effect you are bringing up 2 different onap installations in namespaces -n onap and n onapportal
do
./createAll.bash -n onap
t Io bring everything up (recommended)
or
./createAll.bash -n onap -a robot
./createAll.bash -n onap -a portal
to bring up portal and robot - but if you check the portal yaml you will see it has deependencies - tryi to bring all of onap or the subset in the helm_apps variable in setenv.sh (you need aai, vid...etc - for portal pods to come up)
Michael O'Brien Thanks for your kinds and your nicely answers.
As you suggested i am bringing up all the namespaces related to the portal. So far all the namespacea are up except one, which comes up with an error. could you tell me what is this error. and how should i solve it
root@omap:~/oom/kubernetes/oneclick# ./createAll.bash -n onap -a sdnc
********** Creating instance 1 of ONAP with port range 30200 and 30399
********** Creating ONAP:
********** Creating deployments for sdnc **********
Creating namespace ********** namespace "onap-sdnc" created
Creating service account ********** clusterrolebinding "onap-sdnc-admin-binding" created
Creating registry secret ********** secret "onap-docker-registry-key" created
Creating deployments and services ********** E0205 14:45:32.429304 32344 portforward.go:331] an error occurred forwarding 36188 -> 44134: error forwarding port 44134 to pod 45eb9cfb4133dd7aa42454821eb8ad61fe179e3ad1375e22fd9f5ade6b2a2c2f, uid : exit status 1: 2018/02/05 14:45:31 socat[1991] E connect(5, AF=2 127.0.0.1:44134, 16): Connection refused Error: transport is closing The command helm returned with error code 1
Hi, that is normal/known behaviour. For aaf and vfc - those 2 don't work and have been busted for at least 6 weeks - ignore them - they are being fixed and you don't need them for any vFW, vDNS activity.
For all the rest - the SDNC containers - this is a known intermittent issue with deployments behind a slow network connection (for example I don;'t get these on AWS) - the readiness probes have timed out before the docker images were pulled (only one needs to fail and the rest in the hierarchy tree are waiting for it)
Fix is to ./deleteAll.bash -n onap -a sdnc
delete sdnc and recreate it
./createAll.bash -n onap -a sndc
Both of these issues are documented in several places and on onap-discuss
For example a latest clean install - on the first run - same issue - 2nd run OK - but periodically we still get an SDNC issue because of the closeness of the pull time to the readiness timeout retries
Would be nice to fix these intermittent deploy issues (usually on clean vm's)
No multivim has no runtime use right now - especially for the vFW - in the future when the azure seed code comes in it may work with SO during orchestration.
VF-c as well - not required
You only need the original onap 1.0 level seed code components - in the diagram
Here's the output for amsterdam branch (commit: c27640a084242f77600a8630b475772094ae314a):
user@vm000949:~/onap/oom/kubernetes/oneclick$ kubectl get pod -n onap-portal
NAME READY STATUS RESTARTS AGE
portalapps-59574d47cc-mnzjn 2/2 Running 0 4m
portaldb-74799f6758-zktwp 1/1 Running 0 4m
portalwidgets-5b9ffcd879-m7svb 1/1 Running 0 4m
vnc-portal-588f7768df-xrd7j 0/1 CrashLoopBackOff 4 4m
last parts of kubectl describe pod vnc-portal-588f7768df-xrd7j -n onap-portal:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m default-scheduler Successfully assigned vnc-portal-588f7768df-xrd7j to vm000949
Normal SuccessfulMountVolume 6m kubelet, vm000949 MountVolume.SetUp succeeded for volume "localtime"
Normal SuccessfulMountVolume 6m kubelet, vm000949 MountVolume.SetUp succeeded for volume "ubuntu-init"
Normal SuccessfulMountVolume 6m kubelet, vm000949 MountVolume.SetUp succeeded for volume "vnc-profiles-ini"
Normal SuccessfulMountVolume 6m kubelet, vm000949 MountVolume.SetUp succeeded for volume "default-token-826wp"
Normal Pulling 5m (x3 over 6m) kubelet, vm000949 pulling image "dorowu/ubuntu-desktop-lxde-vnc"
Normal Pulled 5m (x3 over 6m) kubelet, vm000949 Successfully pulled image "dorowu/ubuntu-desktop-lxde-vnc"
Normal Created 5m (x3 over 6m) kubelet, vm000949 Created container
Normal Started 5m (x3 over 6m) kubelet, vm000949 Started container
Warning FailedPostStartHook 5m (x3 over 6m) kubelet, vm000949 Exec lifecycle hook ([/bin/sh -c mkdir -p /root/.mozilla/firefox/onap.default; cp /root/.init_profile/profiles.ini /root/.mozilla/firefox/; echo 'user_pref("browser.tabs.remote.autostart.2", false);' > /root/.mozilla/firefox/onap.default/prefs.js; cat /ubuntu-init/hosts >> /etc/hosts]) for Container "vnc-portal" in Pod "vnc-portal-588f7768df-xrd7j_onap-portal(58e57acc-0b49-11e8-bb68-0800277511ce)" failed - error: command '/bin/sh -c mkdir -p /root/.mozilla/firefox/onap.default; cp /root/.init_profile/profiles.ini /root/.mozilla/firefox/; echo 'user_pref("browser.tabs.remote.autostart.2", false);' > /root/.mozilla/firefox/onap.default/prefs.js; cat /ubuntu-init/hosts >> /etc/hosts' exited with 1: cat: /ubuntu-init/hosts: No such file or directory
, message: "cat: /ubuntu-init/hosts: No such file or directory\n"
Normal Killing 4m (x3 over 5m) kubelet, vm000949 Killing container with id docker://vnc-portal:FailedPostStartHook
Warning BackOff 1m (x15 over 5m) kubelet, vm000949 Back-off restarting failed container
Here's the full log of kubectl logs portalapps-59574d47cc-mnzjn -n onap-portal -c portalapps: https://pastebin.com/5EHN9y1x
Same is for master branch (commit bce13fa0b25fb7932d5ad1be748541682329853c), except vnc-portal does not fail with /ubuntu-init/hosts: No such file or directory, rather it waits for portalapps to startup. So root cause IMO is in portalapps, possibly mysql JDBC driver is missing from classpath? telnet portaldb.onap-portal 3306 works from the portal-apps container
Here's the output of docker images:
user@vm000949:~/onap/oom/kubernetes/oneclick$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
oomk8s/config-init 1.1.15 295541966a5f 5 days ago 918MB
consul latest cdbd79f2a13e 12 days ago 52.3MB
mysql/mysql-server 5.6 30dc57b553c0 13 days ago 226MB
oomk8s/config-init 2.0.0-SNAPSHOT c6b39f41e9cc 2 weeks ago 704MB
gcr.io/kubernetes-helm/tiller v2.8.0 7257caf71e74 2 weeks ago 71.5MB
dorowu/ubuntu-desktop-lxde-vnc latest 8ae3f0f55103 3 weeks ago 1.31GB
k8s.gcr.io/kubernetes-dashboard-amd64 v1.8.1 e94d2f21bc0c 7 weeks ago 121MB
google/cadvisor latest 75f88e3ec333 2 months ago 62.2MB
hello-world latest f2a91732366c 2 months ago 1.85kB
gcr.io/google-containers/kube-addon-manager v6.5 d166ffa9201a 2 months ago 79.5MB
nexus3.onap.org:10001/openecomp/appc-image v1.2.0 399e222d320b 2 months ago 3.04GB
nexus3.onap.org:10001/onap/ccsdk-dgbuilder-image v0.1.0 3e4649f81feb 2 months ago 980MB
gcr.io/k8s-minikube/storage-provisioner v1.8.0 4689081edb10 3 months ago 80.8MB
gcr.io/k8s-minikube/storage-provisioner v1.8.1 4689081edb10 3 months ago 80.8MB
nexus3.onap.org:10001/onap/msb/msb_apigateway 1.0.0 8245d1b34d29 3 months ago 215MB
nexus3.onap.org:10001/onap/msb/msb_discovery 1.0.0 1ab27b2abcfe 3 months ago 201MB
nexus3.onap.org:10001/onap/portal-wms v1.3.0 fff0077a3c33 3 months ago 237MB
nexus3.onap.org:10001/onap/portal-apps v1.3.0 8da6312ec821 3 months ago 677MB
nexus3.onap.org:10001/onap/portal-db v1.3.0 7578762221a7 3 months ago 398MB
k8s.gcr.io/k8s-dns-sidecar-amd64 1.14.5 fed89e8b4248 4 months ago 41.8MB
k8s.gcr.io/k8s-dns-kube-dns-amd64 1.14.5 512cd7425a73 4 months ago 49.4MB
k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64 1.14.5 459944ce8cc4 4 months ago 41.4MB
consul 0.9.3 8d44ae3c4e67 4 months ago 51.4MB
oomk8s/config-init 1.0.1 43b5cc139a47 5 months ago 324MB
docker.elastic.co/beats/filebeat 5.5.0 b61327632415 7 months ago 271MB
oomk8s/ubuntu-init 1.0.0 14bb4db11858 8 months ago 207MB
oomk8s/readiness-check 1.0.0 d3923ba1f99c 8 months ago 579MB
oomk8s/mariadb-client-init 1.0.0 a5fa953bd4e0 8 months ago 251MB
nexus3.onap.org:10001/openecomp/portalapps 1.0-STAGING-latest ee0e08b1704b 10 months ago 1.04GB
nexus3.onap.org:10001/openecomp/portaldb 1.0-STAGING-latest f2a8dc705ba2 10 months ago 395MB
k8s.gcr.io/echoserver 1.4 a90209bb39e3 20 months ago 140MB
gcr.io/google_containers/pause-amd64 3.0 99e59f495ffa 21 months ago 747kB
k8s.gcr.io/pause-amd64 3.0 99e59f495ffa 21 months ago 747kB
nexus3.onap.org:10001/mariadb 10.1.11 d1553bc7007f 24 months ago 346MB
I'm running Ubuntu 16.04.3 LTS on VirtualBox. And instead of rancher and full kubernetes, I'm using minikube with--vm-driver=none flag (I believe this should not matter in this case). Any ideas?
I had the same problem with Amsterdam branch. Master branch has fixes to resolve this. Basically the helm chart they defined lifecycle PostStart which may run before starting container itself (Its not guaranteed). So, please take the portal folder from master branch and replace in Amsterdam or just replace resources folder in side portal (from master) and also portal-vnc-dep.yaml file inside template from master to Amsterdam
Portal works fine in both amsterdam and beijing, You just have to stick to the two version sets
If you see an issue raise a JIRA, describe the problem, ideally provide a workaround or patch and link to existing/related/blocked Jiras so we can review them.
vnc-portal still has issues under the latest version like Helm 2.5+, rancher 1.6.11+ that is OK in master - make sure you are running older versions that match what amsterdam runs with..
Also vnc-portal has dependencies - do you have everything else running like vid for example - i don't see all the dependent pods defined in the yaml for vnc-portal in your image list - just appc and msb.
Remember portal runs against the other components in onap - try to bring the entire system up first and they when you are running OK, start adjusting the system one step at a time to help triage issues. Right now you are dealing with a manager different than the RI, a subset of ONAP deployed and
post your docker, kubernetes, kubectl, helm versions.
Also to aide in getting the system up we for everyone we have standardized on rancher for now - with our second support for kubeadm.
Yes I realised later that I need other components as well to run portal, so I decided to do full installation using rancher. However error message in portalapps (java.sql.SQLException: No suitable driver) makes me think there is still something wrong. Anyways I'll try full setup with rancher and let you know how it goes.
I have another issue now, I'm following QuickstartInstallation from master branch (ce7844b207021251ec76a5aa5d7b8c1de3555a12) and prepull fails with following error:
Error parsing reference: "2.0.0-SNAPSHOT" is not a valid repository/tag: repository name must be lowercase
I'm not sure if I'm supposed to update values.yml file. For now I'll continue using amsterdam branch without prepulling
********** Creating instance 1 of ONAP with port range 30200 and 30399
********** Creating ONAP:
********** Creating deployments for dcaegen2 **********
Creating namespace ********** namespace "onap-dcaegen2" created
Creating service account ********** clusterrolebinding "onap-dcaegen2-admin-binding" created
Creating registry secret ********** secret "onap-docker-registry-key" created
Creating deployments and services ********** secret "dcaegen2-openstack-ssh-private-key" created configmap "dcaegen2-config-inputs" created NAME: onap-dcaegen2 LAST DEPLOYED: Wed Feb 7 08:28:38 2018 NAMESPACE: onap STATUS: DEPLOYED
RESOURCES: ==> v1/Pod NAME READY STATUS RESTARTS AGE dcaegen2 0/1 Pending 0 0s
hit f12 (developer mode) (more tools | developer tools | network tab) to see the underlying http calls and rest calls happening - it should show you what is non-200.
You should verify AAI is up by hitting the nodeport. - direct rest calls can be made there without going through vnc-portal
We have a 4 nodes k8s cluster with 16 G RAM for each node (OpenStack), we are experiencing the 'OutOfDisk' with 3 of the 4 nodes, see below, seems the scheduler does not balance the component memory requirements, and I do not see memory limit configuration in OOM deployment definition file. We will do more investigation about the component memory usage at runtime. Do you have any suggestion and plan for this?
I noticed you are running helm 2.8 in amsterdam - only beijing can run the latest helm.. Your environment is mixed, I would expect kubernetes 1.8 on the server. Your docker version is ok at 1.12 instead of 17.3 (beijing)
Amsterdam still only supports the older version set
Rancher 1.6.10
Kubernetes 1.8.6
Docker 1.12
Helm 2.3 (both client and server)
If you had that many prepull issues with docker images I would expect your network.
Do a full delete and create of the pods to bounce them now that the images are pulled.
and initiated the prepull.sh script. After 6-7hrs I see that no nexus images were downloaded and then I have rerun the "docker pull" manually on one of the images. It is badly slow.
The prepull.sh with higher versions of helm and kubectl was way better.
The only change is the versions of helm and kubectl as against the old VM. And of course the prepull_docker.sh.
Any thoughts about the cause.
Also, it be a great help if I get the count of docker images for a successful deployement. "docker images|wc -l"
the prepull script is not required and it has nothing to do with helm and kubectl (it is just a script that parses image names and tags and then does a docker pull on each) - it is there just so that all the images are available when you bring up the pods - otherwise all the dependencies will need to wait until images load - which usually exceeds the wait times for the pods - which means you need to bring up onap twice.
all the 95 images that the prepull script gathers from the values.yamls take 15 to 20 min on an AWS instance for me.
this means one of two things if you are taking hours to prepull images
1) your internal network has a proxy and is slowing things down - you don't mention if you are inside a firewall
2) you are in a region (asia) that is known to have issues pulling from the nexus3 servers (the LF hosts on a some region that I don't know on AWS) - there are many reports of a mirror being required for China and India - perhaps this is your issue.
bottom line is do a docker images - check that all the images are there - optionally turn off pulling automatically in the yamls
As a test - can you verify that you are following the correct procedure by reproducing the RI at the top of the page
get a spot VM on AWS in the us-west region (ohio is currently 0.07/hour for a 64g R4.2xLarge) - install oom there and you should be up in an hour (5 min for rancher/k8s/helm/docker, 20m for docker pulls, 20m to bring up onap)
.simpledemo.onap.org. -f=yaml -c id ++ awk '{ print $2} ' Could not find requested endpoint in Service Catalog. + SIMPLEDEMO_ONAP_ORG_ZONE_ID=
......................
I have the following configuration for DCAE in onap-parameters.yaml.
######## # DCAE # ########
# Whether or not to deploy DCAE # If set to false, all the parameters below can be left empty or removed # If set to false, update ../dcaegen2/values.yaml disableDcae value to true, # this is to avoid deploying the DCAE deployments and services. DEPLOY_DCAE: "true"
# ------------------------------------------------# # OpenStack Config on which DCAE will be deployed # # ------------------------------------------------#
# Whether to have DCAE deployed on the same OpenStack instance on which VNF will be deployed. # (e.g. re-use the same config as defined above) # If set to true, discard the next config block, else provide the values.
IS_SAME_OPENSTACK_AS_VNF: "true"
# Fill in the values in below block only if IS_SAME_OPENSTACK_AS_VNF set to "false" # --- # Either v2.0 or v3
DCAE_OS_API_VERSION: ""
DCAE_OS_KEYSTONE_URL: ""
DCAE_OS_USERNAME: ""
DCAE_OS_PASSWORD: ""
DCAE_OS_TENANT_NAME: ""
DCAE_OS_TENANT_ID: ""
DCAE_OS_REGION: "" # ---
# We need to provide the config of the public network here, because the DCAE VMs will be # assigned a floating IP on this network so one can access them, to debug for instance. # The ID of the public network.
# This is the private network that will be used by DCAE VMs. The network will be created during the DCAE boostrap process, # and will the subnet created will use this CIDR.
DCAE_OS_OAM_NETWORK_CIDR: "10.99.0.0/27"
# This will be the private ip of the DCAE boostrap VM. This VM is responsible for spinning up the whole DCAE stack (14 VMs total) DCAE_IP_ADDR: "10.99.0.2"
# The flavors' name to be used by DCAE VMs DCAE_OS_FLAVOR_SMALL: "m1.small" DCAE_OS_FLAVOR_MEDIUM: "m1.medium" DCAE_OS_FLAVOR_LARGE: "m1.large" # The images' name to be used by DCAE VMs DCAE_OS_UBUNTU_14_IMAGE: "ubuntu-14.04-server-cloudimg" DCAE_OS_UBUNTU_16_IMAGE: "ubuntu-16.04-server-cloudimg" DCAE_OS_CENTOS_7_IMAGE: "centos7-cloudimg"
# This is the keypair that will be created in OpenStack, and that one can use to access DCAE VMs using ssh. # The private key needs to be in a specific format so at the end of the process, it's formatted properly # when ending up in the DCAE HEAT stack. The best way is to do the following: # - copy paste your key # - surround it with quote # - add \n at the end of each line # - escape the result using https://www.freeformatter.com/java-dotnet-escape.html#ad-output
# Proxy DNS Designate. This means DCAE will run in an instance not support Designate, and Designate will be provided by another instance. # Set to true if you wish to use it DNSAAS_PROXY_ENABLE: "false"
# -----------------------------------------------------# # OpenStack Config on which DNS Designate is supported # # -----------------------------------------------------#
# If this is the same OpenStack used for the VNF or DCAE, please re-enter the values here.
Michael O'Brien I have Rancher 1.6.10, Kubernetes 1.8.6, Docker 1.12 and helm 2.3
./cd.sh -b amsterdam is giving below error
**** Creating configuration for ONAP instance: onap namespace "onap" created Error: YAML parse error on config/templates/pod.yaml: error converting YAML to JSON: yaml: line 57: did not find expected key **** Done **** verify onap-config is 0/1 not 1/1 - as in completed - an error pod - means you are missing onap-parameters.yaml or values are not set in it.
The file pod.yaml format is invalid as per yamllint.com, but same file is there on my other machine too where onap is running fine.
Below line in "createConfig.sh" throws the error, as per my analysis
I have only tested on Ubuntu 16.04 - you are free to try on Redhat 7.3 - let us know if there are any issues by adding a section to this page when you get it working.
in your error you missed pasting what "expected key" was - as in which of the keys in setenv.sh are missing.
There should be no issues with the config pod - I ran it twice last week on amsterdam.
If your config pod fails it means any of the following
you forgot to source setenv.bash
the config docker image is busted from the last build/push - check the version tag
either your server or client are running helm 2.4+
someone merged helm template code in the yaml that requires helm 2.4+
a couple of other reasons i mentioned in a mail to onap-discuss
I am still facing issues with onap to openstack connectivity. If you can please point me to some notes which you used to setup openstack, e.g. created public/private networks, enabled ssh to openstack, enabled connectivity between VMs in openstack. I will really appreciate that.
I'll post my openstack heat template - based on the onap template - to help bring up a VM specific to openstack.
It is a single VM - connectivity is via the public network on you openstack. Remember that the dcae-bootstrap heatbridge will bring up all the DCAE VMs - so this is dynamic. All you need to provide is the tenant-id, tenant name, the keystone urls and the ability to create 15 EIPs - most of is is through the cloudify manager.
I have all the 89 pods up and running(without dcae).
However the robot health check always fails for ASDC component.
------------------------------------------------------------------------------ Basic ASDC Health Check | FAIL | DOWN != UP ------------------------------------------------------------------------------
I have tried deleting and recreating the pod and also the entire OOM redeployment. The problem persists.
First thing I would check is the logs on the container, then the CD job as a comparison, then recent commits on the OOM infrastructure side, then commits to SDC itself. After this trace through the startup and/or debug the health check to start. I can do these for you.
1. We will wait for the fix for the SDC healthcheck issue or a workaround.
2. Request you to share the onap-parameters.yaml you have used to deploy the dacegen2. I have all-in-one openstack setup and no DNS designate/forwarder. What should be the value of the following parameters in my opan-parameter.yaml:
I am fighting with that U-EB issue some time already, various different deployments (both amsterdam & master). I assume that it's not just about this SDC healthcheck, but it simply means that SDC can't talk to DMAAP, is this correct ? Is this show-stopper for ONAP functionality and to get e.g. vFWCL demo running ?
if anyone will fix this red-herring, please share,
I found something, in my case it was configuration problem, U-EB server IP is IMHO wrongly calculated in init/config-init.sh via:
kubectl get nodes -o jsonpath='{ $.items[*].status.addresses[?(@.type=="ExternalIP")].address }'
which is external IP of host and will never match my DMAAP service, I patched /dockerdata-nfs/onap/sdc/environments/AUTO.json redeployed sdc and this one passed !
I am running master release of beijing and i am facing problem to get run the particular container of sdc. sdc-be is showing the imagePullBackoff error.
Can you please suggest the way forward.
Output of kubectl command of sdc-be pod as follows:
kubectl describe pod sdc-be-74488cb585-wpcdn -n onap-sdc
filebeat-onap: Container ID: Image: docker.elastic.co/beats/filebeat:5.5.0 Image ID: Port: <none> State: Waiting Reason: ImagePullBackOff Ready: False Restart Count: 0 Environment: <none> Mounts: /usr/share/filebeat/data from sdc-data-filebeat (rw) /usr/share/filebeat/filebeat.yml from filebeat-conf (rw) /var/log/onap from sdc-logs-2 (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-t94b5 (ro)
I'm trying to get up to date on how to deploy the DCAE components.
From that page it looks like (at least at the beginning) DCAE could not be deployed from OOM, and needed to be deployed using a heat file in OpenStack.
I know that there are effort to make it deployable by k8s. What is the state of OOM k8s deployment of DCAE today?
For a design environment where we try to limit the footprint of the deployment, are there good ways to get this up and running in a simple form.
I was a bit confused with all the references to DCAE.
If I get this right OOM (Amsterdam branch) will kick-off DCAE, but DCAE components will be deployed as VMs in an OpenStack could, not as k8s pods in the ONAP VM.
---
From another page I found that requirements for a full ONAP is:
The ONAP installation requires the following footprint:
29 VM
148 vCPU
336 GB RAM
3 TB Storage
29 floating IP addresses
What is the smallest size the environment can be (I guess here I means only the DCAE OS cloud) to get DCAE "working" for a design deployment?
Based on the page ONAP Deployment Specification for Finance and Operations, I think you can cobble together a setup where you have everything, except DCAE, in kubernetes in one VM, and DCAE in separate VM(s). Unfortunately, I haven't found more concrete instructions than that page.
This page Minimal Assets for Physical Lab lists currently lists 3 different environment sizes for different use cases. Perhaps it could be useful to you.
Sorry, I'm a newbie, and although I'm happy to share what I know and have found, I don't know much more at this time.
I have observed the memory footprint go to 69G after a week on a 122G VM on AWS that was idle - so yes we have essentially crossed the 64G barrier - I will update the RI requirements. On a 64G machine we now saturate to 63G within 48h.
To help you out - you can run with a reduced number of ONAP components - I have been doing this at customer sites recently.
Unless you are running advanced use cases like vVolte or vCPE you can delete the following
Is nexus3.onap.org:10001/onap/refrepo:1.0-STAGING-latest image for refrepo in vnfsdk compoonent just get updated? I can't pull this image anymore, but I can pull this image successfully last week.
This is the error message I got when I try to pull it today
both the heat and oom installations are stable - if you want DCAE via OOM use only amsterdam. If you want the latest code use Beijing.
As you are aware we have minor manual integration testing and no real continuous deployment so whether a particular component works on any day is up to random chance
With amsterdam branch unable to access the services at their default port (Eg: portalapps 8989).
Based on the Kubernetes service configuration it looks like services are exposed using NodePort (within range 30000-32767). Even if we access the service on configured NodePort some of the services like portalapps redirects back to source Port.
Use vnc-portal on 30211 -everything works there because that VM in a container is inside the namespace to resolve ports like 8989- there are workarounds to put in a port redirector in Firefox that Vitaliy showed me - Ideally we get this workaround into a JIRA. Sorry but the ports are currently hardcoded in ONAP , OOM just serves them up.
Amsterdam works relatively fine except for the odd SDC 500/503 and SDNC issue during VF-module creation.
I have deployed onap oom on 64 GB RAM 300 GB HDD VM. All pods are running
After 3-4 days, CPU utilization becomes close to 200% and I need to restart VM or reinstall ONAP. I have faced this multiple times. Why does this happen?
We can also confirm the problem reported originally by Kiran Kamineni. In our case the solution proposed also worked but we have slightly different outcome this morning.
Any ideas as to why one of the pods crashes and two pods are stuck in the init state? Our environment is behind a proxy.
aai-service (both pods) depend on aai-traversal and aai-resources (so the root service is blocked) - see the resources yaml files for reference
aai-traversal has timed out - just delete the aai-resources container and kubernetes will restart it. then when it is 1/1 delete aai-service so it gets recreated.
note: there are limited number of resets - usually if you are not working after 30 min - you will never work until you bounce some pods
Just noticed your date - thought you were before the 16th - you were the first to discover the kubernetes 1.8.9 regression that was ported by Rancher back to 1.6.14
you and the one of the CD systems that had no watcher for a couple days
Hard to keep all the changing references up to date - I do however keep the root oom_entrypoint.sh script current - as this is downloaded in my CD system to bring everything else in
Use the attachment here - I will remove all oom_rancher_setup.sh and cd.sh references from the wiki - until we get all this committed in the OOM repo
there was no oom_rancher_install.sh in the directory. But there was a oom_rancher_setup.sh (which seemed to match the next part of the instructions), so I used it instead,and it seemed to do a lot of nice things. BTW, got an error on first running as user ubuntu, so tried again using sudo which worked.
yes thanks - an mix of old and new edits - I used to name that script oom_rancher_install before I ported it to the OOM repo as oom_rancher_setup - like the captured output says.
Anyway I fine tuned the rancher script - it has been tested on openstack, azure, aws. You can use it to fully provision an Ubuntu 16 box - like you did - and yes if running non-root - do a sudo.
Usually you have to log out/in to pickup the ubuntu user as docker enabled - but lately not.
If you actually run the oom_entrypoint.sh script - you can walk away - assuming the branch is stable - and return after 80 min with a running system - note however that you should comment out cd.sh and replace your own onap-parameters.yaml before running it.
the rancher script is agnostic but very sensitive to the IP or DNS name for the server.
Michael, one concept that is not jumping out at me: what is the command for running a helm chart for a new app after k8s is deployed using these instructions? i.e. when first developing your helm chart
I was able to create/deploy the vFirewall package (packet generator, sinc and firewall vnf)on openstack cloud. But i couldnt able to login into any of vnf's vm.
After when i debug i see i didnt change the default public key with our local public key pair in the PACKET GENERATOR curl jason UI. Now i am deploying the VNF again (same Vfirewall Package) on the openstack cloud, thought of giving our local public key in both pg and sinc json api's.
I have queries for clarifications : - how can we create a VNF package manually/dynamically using SDC component (so that we have leverage of get into the VNF vm and access the capability of the same) - And I want to implement the Service Function chaining for the deployed Vfirewall, please do let me know how to proceed with that.
PS: I have installed/Deployed ONAP using rancher on kubernetes (on openstack cloud platform) without DACE component so i haven't had leverage of using the Closed Loop Automation.
Is there a new problem with the robot pod? I'm using a freshly cloned master branch oom.git and see the following error when attempting to create the robot pod.
root@onap:~/oom/kubernetes/oneclick# ./createAll.bash -n onap -a robot
********** Creating instance 1 of ONAP with port range 30200 and 30399
********** Creating ONAP:
********** Creating deployments for robot **********
Creating deployments and services **********
Error: found in requirements.yaml, but missing in charts/ directory: common
For what it's worth, I haven't changed docker (17.03.2-ce), rancher (server 1.6.14, agent 1.2.9), kubectl (server gitversion 1.8.5-rancher1, client gitversion 1.8.6), and helm (2.6.1) since Mar. 16.
The changes I have since Mar. 16 are ONAP images and oom.git.
Rancher has closed the issue as they have added 1.8.10 to 1.6.14 - retesting
Unfortunately Rancher 1.6.14 which was released months ago has gone through 3 versions of Kubernetes in 7 days (1.8.5. 1.8.9 and 1.8.10) - need to see if we are compatible and also that helm 2.6.1 is ok with 1.8.9
I'm not sure what changed, whether it's that I'm manually starting each component and waiting for a component to become ready before starting a second component.
But there seems to be a dependency between portal and sdc. Portal seems to now be waiting for SDC, and portal-vnc becomes stuck in Init if SDC hasn't been started. (Reversing the order by starting SDC first followed by Portal seems to allow both Portal and SDC to come up.) Maybe it's the same problem as
OOM-514
-
Getting issue details...STATUS
, maybe not because deleting and restarting does not help.
The prepull script needs to login to 10001 in order to work - I recommend you work outside your proxy or open the port - you may have other issues later related to chef/git pulls anyway from inside containers.
We have tried to run the ONAP with the scripts, but both SDC-BE pods have problems to show up. For both we have following info from kubernetes scheduler:
4/17/2018 2:17:33 PMI0417 12:17:33.279347 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"onap", Name:"dev-sdc-be-config-backend-z92d6", UID:"81704d62-4233-11e8-ad46-024d5e023284", APIVersion:"v1", ResourceVersion:"91938", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' No nodes are available that match all of the predicates: Insufficient pods (1). 4/17/2018 2:17:33 PMI0417 12:17:33.439987 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"onap", Name:"dev-sdc-be-599585968d-bhxml", UID:"92387e7d-4233-11e8-ad46-024d5e023284", APIVersion:"v1", ResourceVersion:"91980", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' No nodes are available that match all of the predicates: Insufficient pods (1).
It looks like you've run out of resources. You could add more nodes to your cluster or deploy a subset of the complete ONAP suite. Do customize your deployment edit the kubernetes/onap/values.yaml file and just enable the components you're interested it. There are more complete instructions in the OOM User Guide.
I really doubt that as my VM is 16vCPUs and 104GB of RAM. The usage now is 7vCPUs and around 60GB of RAM. The scheduler would probably give me something like Insufficient CPU or Memory.
We are working on doing a deployment of ONAP. We are following the script instructions and downloading the cd.sh file.
However we get this error:
cp: cannot stat 'values.yaml': No such file or directory
I believe this is causing no pods to get setup, and is causing us other issues. I think we need to put this file in our current directory, but I don't see where to get it.
Could someone enlighten us on where to find the values.yaml file ?
See the current copy referenced in the oom_entrypoint.sh script - I only override the nexus3 repo - better to do the following which will be uploaded tonight
I have a problem with SDC-BE. When I try to enter SDC GUI (SDC-FE) in the SDC backend jetty logs I see the following exception:
2018-04-27T13:28:22.858Z|||||com.att.sdc.23911-SDCforTestDev-v001Client-0|||SDC-BE||||||||ERROR||||dev-sdc-be.onap||c.a.a.d.api.DefaultRequestProcessor||ActivityType=<?>, Desc=<2018-04-27 13:28:22.857 dev-sdc-be-55b786754f-62fwg 1865@dev-sdc-be-55b786754f-62fwg com.att.sdc.23911-SDCforTestDev-v001Client-0-252 null NULL com.att.aft.dme2.api.DefaultRequestProcessor AFT-DME2-0702 No endpoints were registered after trying all route offer search possibilities. Validate that the service has running instances that are properly registering and renewing their endpoint lease. [Context: service=https://dmaap-v1.dev.dmaap.dt.saat.acsi.att.com/events?version=1.0&envContext=TEST&partner=BOT_R&timeout=15000&limit=1;routeOffersTried=MR1:;]> 2018-04-27T13:28:22.858Z|||||com.att.sdc.23911-SDCforTestDev-v001Client-0|||SDC-BE||||||||ERROR||||dev-sdc-be.onap||o.o.s.b.c.d.e.DmaapClientFactory||ActivityType=<?>, Desc=<The exception {} occured upon fetching DMAAP message> com.att.aft.dme2.api.DME2Exception: [AFT-DME2-0702]: No endpoints were registered after trying all route offer search possibilities. Validate that the service has running instances that are properly registering and renewing their endpoint lease. [Context: service=https://dmaap-v1.dev.dmaap.dt.saat.acsi.att.com/events?version=1.0&envContext=TEST&partner=BOT_R&timeout=15000&limit=1;routeOffersTried=MR1:;] at com.att.aft.dme2.api.DefaultRequestProcessor.send(DefaultRequestProcessor.java:188) at com.att.aft.dme2.api.RequestFacade.send(RequestFacade.java:26) at com.att.aft.dme2.api.DME2Client.send(DME2Client.java:116) at com.att.aft.dme2.api.DME2Client.sendAndWait(DME2Client.java:136) at com.att.aft.dme2.api.DME2Client.sendAndWait(DME2Client.java:320) at com.att.nsa.mr.client.impl.MRConsumerImpl.fetch(MRConsumerImpl.java:130) at com.att.nsa.mr.client.impl.MRConsumerImpl.fetch(MRConsumerImpl.java:100) at org.openecomp.sdc.be.components.distribution.engine.DmaapConsumer.lambda$consumeDmaapTopic$1(DmaapConsumer.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ^C
I can't locate the service to which it's pointing:
Hi, join the discussion and PTL meet monday - what you are experiencing is the nexux3 change friday by the LF. They have completely stopped onap deployment (all branches).
50 snapshot containers are being held until a release is requested - hench the ImagePullBackoff
Guys (DevOps) for your reference I requested the docker issue be the first one on the agenda of the 0900 EDT ONAP PTL meeting Monday to devote to this nexus3 devops issue – a quick fix and impact going forward – you are welcome to join.
I was able to launch instances of vFW, vSINK and vPG in openstack and my next step is to run robot heatbridge testcase.
I have created the vnfs using vFWCL as the service type.
But when I run robot heatbridge by using ./demo-k8s.sh heatbridge vFW_SINK_Module 9d16977c-0330-4f00-90e0-7a99a2fc5f23 vFWCL, I am getting error as "Dictionary does not contain key 'vFWCL'".
I saw that /var/opt/OpenECOMP_ETE/robot/assets/service_mappings.py in robot container does not have an entry for vFWCL, but it has entries for vFWSNK, vPKG and vFW.
Do I need to use any latest script? Or do I need to run heatbridge by using vFW or vFWSNK (eg. ./demo-k8s.sh heatbridge vFW_SINK_Module 9d16977c-0330-4f00-90e0-7a99a2fc5f23 vFW) in place of vFWCL even though I am using vFWCL as service type?
I suspect you missed a dash (there's a double dash before all-namespaces). The only time I've heard of pods being spontaneously lost is if your hardware is undersized.
It happened this morning, and I found that the helm release was gone. So I had redo the helm install. We have 128 GB of ram and 16 processors. Do you know what hardware component being undersized could cause this?
Generally best practice is to have them running on separate machines but I wouldn't think that is the problem here. Are you able to re-install and make progress?
Well this is just a VM on a larger machine, but the VM is assigned 128 GB. I don't think that should cause problems.
Yeah, I can reinstall. I usually end up having them go down some time after reinstalling, I am able to make progress, but I lose time redeploying once or twice a day, which can be pretty time consuming.
I know of systems that have been in production for a long time (months) so you shouldn't have to re-install normally. Thanks for not giving up, it's important that we identify these types of issues as you might not be the only one to experience them. Maybe we (I) can add further documentation to avoid the problem if we know what it is.
Thanks for your help so far. I'll see if it keeps happening, and if I can figure anything else out about it. Definitely we should document it if we can figure it out.
I did an install with all the components, and it is not disappearing like before. I have noticed one issue,
Some pods are stuck in pending with a failedScheduling error. It indicates insufficient pods. General utilization of other resources is very low. I read around and it seems I might need to add a new node to the cluster. I have plenty of resources available, so is this something I could do on the same machine ?
Happy to hear that William. Kubernetes has a limit of 110 pods on one node that probably seemed like an impossibly large number but we're hitting this in ONAP now. You can add nodes to your cluster (via Rancher) and the pods will get distributed. Unfortunately some of the projects have made assumptions that business logic and data-bases are co-located which isn't true in a system with multiple nodes so we've seen problems. The OOM team has been working with the other project teams to fix this so it would be great if you would try out a multi-node deployment (I expect you won't see any problems but...).
The configuration section of the OOM User Guide describes how you can customize your deployment to a subset of the total ONAP components. This allows a great deal of flexibility and may allow you to avoid the pod limit if you can work with less than all of the ONAP components.
I'm observing that k8s is very casual in pulling images. It can often be minutes before it pulls the next one even when there are many more images to pull still, and even if all of the images are available in a cache on the local network. Is there a configuration in k8s or helm where we can speed up the image pulling?
Incidentally this seems to be more pronounced in Beijing; I don't recall observing such a behavior in Amsterdam.
I've managed to deploy with kubernetes and pull up the portal.
I am currently trying to login and having issues. I use the user demo/demo123456! and it seems correct based on the debug logs on the container. However, the login doesn't work. The backend doesn't send back a seesion id.
I notice from the error logs, it states that the session is expired, and there is a nullpointer in the music project, while trying to get a lock from zookeeper.
java.lang.NullPointerException: null at org.onap.music.lockingservice.MusicLockingService.createLockId(MusicLockingService.java:112)
I am not sure if this is the place to ask about it. I would appreciate any help or direction on a better place to ask.
We are deploying with 2 VMs on a machine with KVM, and Rancher. We are having issues with sessions in the Portal UI.
However, the Portal team was not able to reproduce our issues. I noticed that the test environment is Openstack based, but we do not use that. Is it possible that using a non Openstack environment could be causing issues with some of the pods, and cause issues with sessions in a spring UI like the Portal ?
Depending on your undercloud - the IP discovered by the agent may not be the routable IP - you can override this with the setting -c false and a routableip
I had an issue (evidently I tested with true on AWS but not false the last time I committed) with the format of the alternative option when -c is false - I have a patch I am trying to get in for the last 3 days
using -c false seems to only be needed on openstack systems - not AWS
If you override the address then use the patch below
Error from server (BadRequest): a container name must be specified for pod onap-clamp-7d69d4cdd7-ct7b6, choose one of: [clamp-filebeat-onap clamp] or one of the init containers: [clamp-readiness]
There are only 2 containers in this pod ( clamp-filebeat-onap and clamp ) but the Init-container clamp-readiness is trying to use a container call clampdb that why it failed. But I'm not sure what should I change here? Should I replace clampdb with clamp?
What do you get when you try kubectl describe po/onap-clamp-7d69d4cdd7-ct7b6 -n onap ? It should tell you which container is not coming up and what is the reason for failure.
i have bare metal machine with configuration 337G RAM and 72 CPUs and on this bare metal machine i have created a virtual machine on that bare metal with configuration for 140G RAM and 32 CPUs using vagrant.
after creation of virtual machine i have create Rancher environment using below script
What version of ONAP are you trying to work with? You're using the instructions from 'latest' (basically Beijing at this point) but you say that you're trying to install the Amsterdam release.
my requirement is Amsterdam version of ONAP. please help me out and guide me for the same and provide me the way and steps, how i can install Amsterdam version of ONAP
What is the CPU usage on your worker node? Does your worker node constantly go to "Not Ready" State? I recommend you to increase your vCPU on your VM to 48 or 64.
PS For Amsterdam, you can run it on a all-in-one node k8s. But for Beijing(master), you need to have at least 2 worker nodes to fit all of the pods created from ONAP
Let's check why there are pods in Evicted and Pending state. You can get that by issue "kubectl describe pod <pod-name> -n <namespace>" It should tell you the reason. I suspect is some resource (storage, cpu, etc) in your environment is not sufficient enough so it cause the pod be evicted from the node and the scheduler can't find a node to assign the pod on.
I tried to install ONAP Beijing using OOM/Rancher/Kubernetes/Helm and noticed some of the pods are pending in "CrashLoopBackOff" state or some in Running state with Ready - 0/1). Before confirming this to be an issue with the installation or component configuration, I would like to know if there is a way I can selectively restart the pods and pull images afresh. I could see the command "helm del dev --purge" , but this deletes all the pods and takes too much time.
I tried to install ONAP Beijing using OOM and my system configuration is 350G RAM, 72 CPUs, 3T Disk.
i tried to install ONAP Beijing service one by one using "helm install locall/service_name". i successfully installed 22 services but after installation of 22 services i continued to install other services but it stuck in pending state
I am now struggling to deploy ONAP Beijing via OOM with 2 nodes. But I found there was CrashLoopBackOff with aaf and clamp. I tried to re-deploy many times, but this problem still existed. I noticed the previous aai-champ issue which I met before was resolved by "Merge "Fix aai-champ service" into beijing Borislav Glozman". So I believe that there should be some fixes in aaf and clamp. Could anybody look at this issue?
onap@onap-PowerEdge-R730:~$ kubectl get pods -n onap -o wide|grep 0/
Can you describe the pod to see if it is liveness or readiness issue?
I faced similar CrashLoopBackOff for dev-aaf-oauth and dev-aaf-service yesterday. Increasing the liveness an readiness delays in following file solved the problem
oom/kubernetes/aaf/values.yaml
liveness:
initialDelaySeconds: 180 periodSeconds: 10 # necessary to disable liveness probe when setting breakpoints # in debugger so K8s doesn't restart unresponsive container enabled: true readiness: initialDelaySeconds: 60 periodSeconds: 10
I listed the description of crashLoopBackOff pod dev-aaf-locate-57c96f8bb9-spn7r below. I am not sure if it was related to liveness or readiness, but there was no liveness or readiness events from the log. I will try to increasing the liveness and readiness delays as you provided. Hopefully it can solve this issue.
Hi Borislav Glozman I configured my environment with 6 nodes and all the nodes are active in Rancher UI. but when i run the kubctl get nodes command on terminal it only displayed only one worker node.
please guide my what is the problem with my set up. and how i can resolved these issues.
please find attached image for the reference.
and also wants to know kube-system services are running only one worker node and shared with other worker node is it ok ?
I am planning to install onap Beijing release using approach onap on kubernetes. I don't want to use heat template for onap installation.
I have few queries
1> For kubernetes environment , i don't want to use rancher approach. But then how can i update kubectl config . In OOM User giude , its mentioned to paste kubectl config from Rancher
2> I have a high end server of 256 GB Ram . So Do i still require multiple node setup
3> If i go ahead with Rancher , Can i install Onap directly on server or Do i have to install Openstack Ocata and create instance on it.
did you get any answer to question number 3? I am also planning to install ONAP Beijing release on Kubernetes but wondering if I still need to install openstack first or not.
i'm tryting to instal onap beijing with oom. The message-router pod failed with 3904 port connection refused, can anyone suggest on these?
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 9m default-scheduler Successfully assigned dmaap-dev-message-router-68b49998bf-h7w4x to k8s-2 Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "cadi" Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "appprops" Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "mykey" Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "default-token-rtdjb" Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "localtime" Normal Pulled 9m kubelet, k8s-2 Container image "oomk8s/readiness-check:2.0.0" already present on machine Normal Started 9m kubelet, k8s-2 Started container Normal Created 9m kubelet, k8s-2 Created container Warning Unhealthy 7m (x6 over 8m) kubelet, k8s-2 Readiness probe failed: dial tcp 10.42.229.95:3904: getsockopt: connection refused Normal Pulled 7m (x2 over 8m) kubelet, k8s-2 Container image "nexus3.onap.org:10001/onap/dmaap/dmaap-mr:1.1.4" already present on machine Normal Killing 7m kubelet, k8s-2 Killing container with id docker://message-router:Container failed liveness probe.. Container will be killed and recreated. Normal Created 7m (x2 over 8m) kubelet, k8s-2 Created container Normal Started 7m (x2 over 8m) kubelet, k8s-2 Started container Warning Unhealthy 4m (x11 over 8m) kubelet, k8s-2 Liveness probe failed: dial tcp 10.42.229.95:3904: getsockopt: connection refused
i installed ONAP Beijing Release using OOM and i want to check installed service list and state of the service. how can i check the active service list and how they services are interact to each other.
how can i verify all the services are installed successfully.
All components interact with each other using there Internal APIs, if you want to know a state of particular component you can run the health check `/oom/kubernetes/robot/ete-k8s.sh health` if it result looks good for components you can use that.
ONAP have many components and If I was in your place I will analyze my use-case and only look at the components of my interest, developer wiki is a nice place to start to understand the component.
Hi Michael O'Brien and All, I am running Kubernetes in AWS clustered environment 1 master + 4 nodes. As some portals are running at different nodes. So as per readthedocs, I guess I should create separate elastic IP for each nodes so that it wont change on restarts. Also then I have to configure them in my local hostfile to map to different URLs eg.,
As elastic ip in AWS is limited, how is you cluster demo handling this? Please note we are not configuring route 53 DNS, for just try out we will stick to elastic ip for accessing portals.
kubernetes takes care of this for you via service routing using the dns service inside k8s.
you only need an EIP for the master - so that rancher host registration does not change.
For the cluster nodes - these can keep the etherial ip's as long as the vms are not rebooted.
I use route53 only so I register a domain name not an ip in rancher - and to keep a github oauth up during cluster rebuilds - to keep the crypto miners off the 10250 port.
Ideally you access all the above GUI's (aai, policy etc) from portal - which is different - it is using a non-ELB rancher supplied load balancer implementation - so runs from any VM host.
Thanks a lot, that clarifies. While running OOM scripts I have server parameter with Private Domain Name, so that it wont change in reboot. I noticed something strange while restarting VMs, Portal users other than Demo was missing. Is this a known issue?
Onap will come up on one or more ubuntu 16 metal/vms regardless of the undercloud (aws, Azure, gcd, openstack...) as long as you use rancher to get a default LoadBalancer. You only need openstack to point SO at to instantiate VMs currently, until the cloud plugins are finished.
There are 5 installation scripts around, one oif the early ones is in my logging-analytics repo under deploy - still running helm install instead of the helm plugin. My CD system runs on AWS for example and my dev system on an ubuntu VM on VMWare on my laptop (no openstack)
Thank you for the response Michael. So if I can get one server that supports the below hardware requirements (+ install Ubuntu 16), then that should be enough to bring up ONAP OOM Beijing on Kubernetes, correct?
OOM is based on K8s and Helm which enables EKS and other K8s native based installs; however, I don't know of anyone actually doing this yet. Have you tried this?
Hi all, I've tried to install ONAP via OOM (disabled dcaegen2 since I don't want to use DCAE) using https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_quickstart_guide.html. But most of the pods either are either in BrashLoopBackOff/ContainerCreating/Error/ErrImagePull/Init etc states. I have increased liveliness/readiness in for aaf as stated here in comments. Any caveats on how to make it work? Is rancher mandatory to deploy?
My environment is VMware ESXi + KubernetesAnywhere (proper kubernetes and helm version in place). I have 4 nodes + 1 master, each 8 vCPUs & 32GB ram.
You don't need rancher - but it is our reference implementation (used to bootstrap kubernetes and help setup the cluster - it also provides a default loadbalancer implementation - so you don't have to configure for example AWS ELB/ALB)
If you use something else like kubeadm - any issues you have - you are kind of on your own - hence why we are standardizing on one implementation to minimize undercloud variances.
If more than a couple (less than 5) have imagepull errors - either nexus3 is hammered (not likely usually) - or you are having issues inside your network pulling from nexus3
It is hard to tell from the docs if the rancher-cni implementation is only for cattle or kubernetes in our case. I don't think they use flannel for CNI - but I am not 100% sure yet until I verify it
79e4e19bf040 rancher/net:v0.13.17 "/rancher-entrypoi..." 2 weeks ago Up 2 weeks r-ipsec-cni-driver-1-e8bee4f4
I am not sure if here is the right place to ask my question. Please advise if it is not ...
I am working on ONAP deployment options and I am investigating the ONAP OOM option with Kubernetes on Openstack.
I installed a few ONAP Beijing components with chart version 2.0.0 (such as SO, SDC, Portal, …) on an Openstack cloud by installing a Kubernetes cluster with Rancher in multiple hosts to manage the entire life-cycle of my ONAP installation.
The setup was completed and most of the pods were running with no issues.
My question is now on upgrading this light instance of ONAP from Beijing to Casablanca without losing any data or causing any down time. I figured out that different infrastructure versions of Hlem/Kubernetes/Rancher/kubectl are supported on Beijing and Casablanca releases. So, I managed (not an easy task!) to upgrade them accordingly, but some of the installed ONAP Beijing containers went down. I also added repos locally for Casablanca with chart version 3.0.0. However, my attempts on the platform upgrade have not been successful so far (neither from yaml file nor component-by-component).
All the examples and guides in Wiki are for minor upgrades in same release and I could not find anywhere mentioning on how to upgrade/rollback between major releases (e.g., Beijing-Casablanca). Is this supported and tested? If so, could you please give me the guideline links?
I even installed tried with single components (e.g., SO) in Beijing and tried to upgrade it to Casablanca. But it is not successful. Any idea or reason?
Major upgrades are still a work in progress. The OOM team is working on some capabilities to assist in upgrades but this will not but sufficient in itself - there will always be work required from the project teams to make major upgrades seamless. One of the more challenging aspects is dealing with schema changes across versions which require migration scripts and versioned APIs. You may want to follow:
OOM-9
-
Getting issue details...STATUS
Note that below versions are for rancher 1.6.25 in queue for
LOG-895
-
Getting issue details...STATUS
- but your timeouts during deployment are not good - they look random - are you running a fast enough master VM (vCore, ram, network) - looks like your cluster may be saturated - check top on the vm.
ubuntu@onap-oom-obrien-rancher-0:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T18:02:47Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.5-rancher1", GitCommit:"44636ddf318af0483af806e255d0be4bb6a2e3d4", GitTreeState:"clean", BuildDate:"2018-12-04T04:28:34Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
ubuntu@onap-oom-obrien-rancher-0:~$ docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Experimental: false
ubuntu@onap-oom-obrien-rancher-0:~$ helm version
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Could anyone help me to know which version is compatible to install Openstack as a infra for OOM installation ? is Ocata is still Ok or we should go higher version like pike, Beijing i installed on Ocata.
I plan to install ONAP C version in our lab, shoud OpenStack be ready before I installing ONAP? I mean can I install ONAP on physical server directly without OpenStack?
We are trying to install cassablanca on kubernetes and we are facing issues with AAI. AAi is not able to run because of issue with graphadmin, Can you let us know if we have missed some configuration ? We followed the same sequence that you have specified above. All other components "consul ,msb, dmaap, dcaegen2, aaf, robot" are up and running.
The following error is seen in "aai-aai-graphadmin-create-db-schema-mjc4z" JOB.
Project Build Version: 1.0.1 chown: changing ownership of '/opt/app/aai-graphadmin/resources/application.properties': Read-only file system chown: changing ownership of '/opt/app/aai-graphadmin/resources/etc/appprops/aaiconfig.properties': Read-only file system chown: changing ownership of '/opt/app/aai-graphadmin/resources/etc/appprops/janusgraph-cached.properties': Read-only file system chown: changing ownership of '/opt/app/aai-graphadmin/resources/etc/appprops/janusgraph-realtime.properties': Read-only file system chown: changing ownership of '/opt/app/aai-graphadmin/resources/etc/auth/aai_keystore': Read-only file system chown: changing ownership of '/opt/app/aai-graphadmin/resources/localhost-access-logback.xml': Read-only file system chown: changing ownership of '/opt/app/aai-graphadmin/resources/logback.xml': Read-only file system
Wed Apr 24 16:09:21 IST 2019 Starting /opt/app/aai-graphadmin/bin/createDBSchema.sh ---- NOTE --- about to open graph (takes a little while)--------; Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) at org.springframework.boot.loader.PropertiesLauncher.main(PropertiesLauncher.java:595) Caused by: java.lang.ExceptionInInitializerError at org.onap.aai.dbmap.AAIGraph.getInstance(AAIGraph.java:103) at org.onap.aai.schema.GenTester.main(GenTester.java:126) ... 8 more Caused by: java.lang.RuntimeException: Failed to instantiate graphs at org.onap.aai.dbmap.AAIGraph.<init>(AAIGraph.java:85) at org.onap.aai.dbmap.AAIGraph.<init>(AAIGraph.java:57) at org.onap.aai.dbmap.AAIGraph$Helper.<clinit>(AAIGraph.java:90) ... 10 more Caused by: org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57) at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:159) at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration.get(KCVSConfiguration.java:100) at org.janusgraph.diskstorage.configuration.BasicConfiguration.isFrozen(BasicConfiguration.java:106) at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1394) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:164) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:133) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:113) at org.onap.aai.dbmap.AAIGraph.loadGraph(AAIGraph.java:115) at org.onap.aai.dbmap.AAIGraph.<init>(AAIGraph.java:82) ... 12 more Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT1M at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101) at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55) ... 21 more Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:161) at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:115) at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getSlice(AstyanaxKeyColumnValueStore.java:104) at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:103) at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:100) at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:148) at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:162) at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69) ... 22 more Caused by: com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.233.66.28(10.233.66.28):9160, latency=1(1), attempts=1]UnavailableException() at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28) at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:153) at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119) at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352) at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538) at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:159) ... 29 more Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14687) at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14633) at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:14559) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:741) at org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:725) at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:544) at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:541) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60) ... 35 more Failed to run the tool /opt/app/aai-graphadmin/bin/createDBSchema.sh successfully Failed to run the createDBSchema.sh
I am trying to revive my ONAP instance after disaster recovery of etcd cluster.
I have all 6 kub pods running and it shows one old tiller-deploy too in unknown state, but I am unable to deploy new pods. sudo helm deploy and helm list give same error
onap-m1@onap-m1:~$ helm list Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist
onap-m1@onap-m1:~/oom/kubernetes/nbi$ sudo helm deploy nbi local/onap --namespace onap --verbose fetching local/onap Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist
onap-m1@onap-m1:~/oom/kubernetes/nbi$ kubectl get po -n kube-system NAME READY STATUS RESTARTS AGE heapster-7b48b696fc-rzl8j 1/1 Running 0 49d kube-dns-6655f78c68-hqpld 3/3 Running 0 49d kubernetes-dashboard-6f54f7c4b-s94dh 1/1 Running 0 49d monitoring-grafana-7877679464-jmkcc 1/1 Running 0 49d monitoring-influxdb-64664c6cf5-8k5nm 1/1 Running 0 49d tiller-deploy-6f4745cbcf-7ccql 1/1 Running 0 49d tiller-deploy-b5f895978-p7vtc 0/1 Unknown 0 55d
onap-m1@onap-m1:~/oom/kubernetes/nbi$ helm init --upgrade $HELM_HOME has been configured at /home/infyonap-m1/.helm.
Tiller (the Helm server-side component) has been upgraded to the current version. Happy Helming!
I am trying to install ONAP dublin and getting this error while make all
Error: Can't get a valid version for repositories aai. Try changing the version constraint in requirements.yaml make[1]: *** [dep-onap] Error 1 make[1]: Leaving directory `/home/centos/dublin/oom/kubernetes' make: *** [onap] Error 2
the oom/kubernetes/aai folder is empty, anyone have solved this issue?
I am installing the ONAP dublin release using OOM with RKE and Openstack. Followed the documentation as provided in the link. Below pods are getting CrashLoopBackOff because of "No private IPv4 address found". Any help will be highly appreciable.
Created a Router which connects above two interfaces. Also the security group is default and add all the rules to allow all the ingress and egress traffic for tcp, udp, icmp, DNS, etc
Kubernetes cluster is created using the RKE and cluster.yml file. BElow is the kubernmetes cluster details.
Master - 1 node
Worker - 6 nodes
also created a Instance where the RKE is installed and also serves as a NFS server.
Kindly let me know which logs can help to debug the issue.
nonetheless, i patched things up and managed to deploy. But at the end my installation had no pods to show despite successfully installing the helm charts.
so i came here to follow the installation links only to realise on the links highlighted above are broken.
please can someone povide viable links for the installation of oom with kubernetes.
562 Comments
kranthi guttikonda
Hi Michael O'Brien Does this include DCAE as well? I think this is the best way to install ONAP. Does this include any config files as well to talk to openstack cloud to instantiate VNFs?
Michael O'Brien
Sorry,
DCAE is not currently in the repo yet - that will require consolidation of the DCAE Controller (a lot of work)
../oneclick/dcae.sh is listed as "under construction"
As far as I know VNFs like the vFirewall come up, however closed loop operations will need DCAE.
/michael
Gülsüm Atıcı
Hi,
I am planning to install ONAP but couldn't decide to use which way of the setup. Using Full ONAP setup on VMs or Kubernetes based setup with containers. Are both solutions will be developed in the future or development will continue with one of them ?
Do you have any advise about it ?
Kumar Lakshman Kumar
Hi Gatici,
you can use the Kubernetes one. In Beijing even DCAE is containerized. you can use OOM to install the Full ONAP on kubernetes cluster.
Gülsüm Atıcı
Thanks Kumar.
kranthi guttikonda
Thanks Michael O'Brien
Jason Hunt
I see the recently added update about not being able to pull images because of missing credentials. I encountered this yesterday and was able to get a workaround done by creating the secret and embedding the imagePullSecrets to the *-deployment.yaml file.
Here's steps just for the robot:
then added to the robot-deployment.yaml (above volumes):
This has to be done for each namspace and each script with the image would need to be updated. An alternate that I'm looking at is:
- modify the default service account for the namespace to use this secret as an imagePullSecret.
- kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "myregistrykey"}]}'
- Now, any new pods created in the current namespace will have this added to their spec:
(from https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ )
This would probably have to be done in the createAll.bash script, possibly with the userid/password as parameters to that script.
Is there a suggested approach? if so, I can submit some updates.
Michael O'Brien
Talk about parallel development - google served me
https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#create-a-secret-that-holds-your-authorization-token
kubectl create secret docker-registry regsecret --docker-server=nexus3.onap.org:10001 --docker-username=docker --docker-password=docker --docker-email=email@email.com
testing this now
/michael
Michael O'Brien
Jason,
In our current environment (namespace 1:1 → service 1:1 → pod 1:1 → docker container) it looks like the following single command will have a global scope (no need to modify individual yaml files - a slight alternative to what you have suggested which would work as well.
kubectl create secret docker-registry regsecret --docker-server=nexus3.onap.org:10001 --docker-username=docker --docker-password=docker --docker-email=email@email.com
So no code changes which is good. Currently everything seems to be coming up - but my 70G VM is at 99% so we need more HD space.
Edit: actually even though it looked to work
2017-06-30T19:31 UTC
2017-06-30T19:31 UTC
pulling image "nexus3.onap.org:10001/openecomp/sdc-elasticsearch:1.0-STAGING-latest"
kubelet 172.17.4.99
spec.containers{sdc-es}
2
2017-06-30T19:31 UTC
2017-06-30T19:31 UTC
still getting errors without the namespace for each service like in your example - if we wait long enough
So a better fix Yves and I are testing is to put the line just after the namespace creation in createAll.bash
create_namespace() {
kubectl create namespace $1-$2
kubectl --namespace $1-$2 create secret docker-registry regsecret --docker-server=nexus3.onap.org:10001 --docker-username=docker --docker-password=docker --docker-email=email@email.com
}
/michael
Jason Hunt
Michael,
I'm surprised that it appears to work for you, as it doesn't for my environment. First, you should have to specify the imagePullSecrets for it to work... that can either be done in the yaml or by using the patch serviceaccount command. Second, the scope of the secret for imagePullSecrets is just that namespace:
source: https://kubernetes.io/docs/concepts/containers/images/#creating-a-secret-with-a-docker-config
In your environment, had you previously pulled the images before? I noticed in my environment that it would find a previously pulled image even if I didn't have the authentication credentials. To test that out, I had to add " imagePullPolicy: Always " to the *-deployment.yaml file under the container scope, so it would always try to pull it.
So I think a fix is necessary. I can submit a suggested change to the createAll.bash script that creates the secret and updates the service account in each namespace?
Jason Hunt
I think you'll need to add to the service account, too, so....
I will test now.
Michael O'Brien
We previously saw a successful pull from nexus3 - but that turned out to be a leftover mod in my branch yaml for a specific pod.
Yes, I should know in about 10 min (in the middle of a redeploy) if I need to patch - makes sense because it would assume a magical 1:1 association - what if I created several secrets.
I'll adjust and retest.
btw, thanks for working with us getting Kubernetes/oom up!
/michael
Jason Hunt
My test of the updated create_namespace() method eliminated all of the "no credentials" errors. I have plenty of other errors (most seem to be related to the readiness check timing out), but I think this one is licked.
Is there a better way to track this than the comments here? Jira?
Michael O'Brien
JIRA is
OOM-3 - Getting issue details... STATUS
Michael O'Brien
Looks like we will need to specify the secret on each yaml file - because of our mixed nexus3/dockerhub repos
When we try to pull from dockerhub - the secret gets applied
Failed to pull image "oomk8s/readiness-check:1.0.0": unexpected EOF
Error syncing pod, skipping: failed to "StartContainer" for "mso-readiness" with ErrImagePull: "unexpected EOF"
MountVolume.SetUp failed for volume "kubernetes.io/secret/3a7b5084-5dd2-11e7-b73a-08002723e514-default-token-fs361" (spec.Name: "default-token-fs361") pod "3a7b5084-5dd2-11e7-b73a-08002723e514" (UID: "3a7b5084-5dd2-11e7-b73a-08002723e514") with: Get http://127.0.0.1:8080/api/v1/namespaces/onap3-mso/secrets/default-token-fs361: dial tcp 127.0.0.1:8080: getsockopt: connection refused
retesting
Michael O'Brien
Actually our mso images loaded fine after internal retries - bringing up the whole system (except dcae) - so this is without a secret override on the yamls that target nexus3.
It includes your patch line from above
My vagrant vm ran out of HD space at 19G - resizing
v.customize ["modifyhd", "aa296a7e-ae13-4212-a756-5bf2a8461b48", "--resize", "32768"]
wont work on the coreos image - moving up one level of virtualization (docker on virtualbox on vmware-rhel73 in win10) to (docker on virtualbox on win10)
vid still failing on FS
/michael
Vaibhav Chopra
I am getting the "Error syncing Pod" errors in bringing currently only aai and vid pod up.
I implemented even both the fix mentioned in OOM-3 -
1)
create_namespace() {
kubectl create namespace $1-$2
kubectl --namespace $1-$2 create secret docker-registry regsecret --docker-server=nexus3.onap.org:10001 --docker-username=docker --docker-password=docker --docker-email=email@email.com
kubectl --namespace $1-$2 patch serviceaccount default -p '{"imagePullSecrets": [{"name": "regsecret"}]}'
}
2) Adding below in vid-server-deployment.yaml
Errors:-
aai-service-403142545-f620t
onap-aai
Waiting: PodInitializing
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-aai.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal
Error syncing pod
vid-mariadb-1108617343-zgnbd
onap-vid
Waiting: rpc error: code = 2 desc = failed to start container "c4966c8f8dbfdf460ca661afa94adc7f536fd4b33ed3af7a0857ecdeefed1225": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/vid/vid/lf_config/vid-my.cnf\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36/etc/mysql/my.cnf\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-vid.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal
Error: failed to start container "vid-mariadb": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/vid/vid/lf_config/vid-my.cnf\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/8a2abc00538b1bec820b272692b4367922893fb7eed6851cfca6e4d3445d1b36/etc/mysql/my.cnf\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
Error syncing pod
Is there anything I am missing here?
Michael O'Brien
Vaibhav,
Hi, OOM-3 has been deprecated (it is in the closed state) - the secrets fix is implemented differently now - you don't need the workaround.
Also the search line limits is a bug in rancher that you can ignore - it is warning that more than 5 dns search terms were used - not an issue - see my other comments on this page
https://github.com/rancher/rancher/issues/9303
The only real issue is "Error syncing pod" this is an intermittent timing issue (most likely) that we are working on - a faster/more-cores system should see less of this.
If you only have 2 working pods - you might not have run the config-init pod - verify you have /dockerdata-nfs on you host FS.
for vid you should see (20170831 1.1 build)
onap-vid vid-mariadb-2932072366-gw6b7 1/1 Running 0 1h
onap-vid vid-server-377438368-bt6zg 1/1 Running 0 1h
/michael
Vaibhav Chopra
Hi Michael,
I have ran the config-init, but at that time I was installing one by one only, Now I tried to install in one go and got success for below:-
kube-system heapster-4285517626-q0996 1/1 Running 5 19h 10.42.41.231 storm0220.cloud.com
kube-system kube-dns-2514474280-kvcvx 3/3 Running 12 19h 10.42.4.230 storm0220.cloud.com
kube-system kubernetes-dashboard-716739405-fjxpm 1/1 Running 7 19h 10.42.35.168 storm0220.cloud.com
kube-system monitoring-grafana-3552275057-0v7mk 1/1 Running 6 19h 10.42.128.254 storm0220.cloud.com
kube-system monitoring-influxdb-4110454889-vxv19 1/1 Running 6 19h 10.42.159.54 storm0220.cloud.com
kube-system tiller-deploy-737598192-t56wv 1/1 Running 2 19h 10.42.61.18 storm0220.cloud.com
onap-aai hbase-2720973979-p2btt 0/1 Running 0 17h 10.42.12.51 storm0220.cloud.com
onap-appc appc-dbhost-3721796594-v9k2k 1/1 Running 0 17h 10.42.215.107 storm0220.cloud.com
onap-message-router zookeeper-4131483451-r5msz 1/1 Running 0 17h 10.42.76.76 storm0220.cloud.com
onap-mso mariadb-786536066-dx5px 1/1 Running 0 17h 10.42.88.165 storm0220.cloud.com
onap-policy mariadb-1621559354-nbrvh 1/1 Running 0 17h 10.42.108.42 storm0220.cloud.com
onap-portal portaldb-3934803085-fj217 1/1 Running 0 17h 10.42.145.204 storm0220.cloud.com
onap-robot robot-1597903591-fffz3 1/1 Running 0 1h 10.42.253.121 storm0220.cloud.com
onap-sdnc sdnc-dbhost-3459361889-7xdmw 1/1 Running 0 17h 10.42.58.17 storm0220.cloud.com
onap-vid vid-mariadb-1108617343-gsv8f 1/1 Running 0 17h 10.42.175.190 storm0220.cloud.com
but yes, again Many of them are stuck with the same error :- "Error Syncing POD"
and yes now the Server I am using is having 128GB Ram. (Though I have configured proxy in best known manner, but do you think this also can relates to proxy then I will dig more in that direction)
BR/
VC
Michael O'Brien
I'll contact you directly about proxy access.
Personally I try to run on machines/VMs outside the corporate proxy - to avoid the proxy part of the triage equation
/michael
Vaibhav Chopra
Sure, Thanks Frank,
Will check the Proxy, Anyways other than proxy, whenever you get to know a fix against "Error Syncing POD" , Please update us.
Currently,I have 20 out of 34 onap PODs are running fine and rest all are failing with "Error syncing POD"
BR/
VC
Michael O'Brien
Update: containers are loading now - for example both pods for VID come up ok if we first run the config-init pod to bring up the config mounts. Also there is an issue with unresolved DNS entries that is fixed temporarily by adding to /etc/resolv.conf
1) mount config files
root@obriensystemsucont0:~/onap/oom/kubernetes/config# kubectl create -f pod-config-init.yaml
pod "config-init" created
2) fix DNS search
https://github.com/rancher/rancher/issues/9303
Fix DNS resolution before running any more pods ( add service.ns.svc.cluster.local)
root@obriensystemskub0:~/oom/kubernetes/oneclick# cat /etc/resolv.conf
nameserver 192.168.241.2
search localdomain service.ns.svc.cluster.local
3) run or restart VID service as an example (one of 10 failing pods)
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces
onap-vid vid-mariadb-1357170716-k36tm 1/1 Running 0 10m
onap-vid vid-server-248645937-8tt6p 1/1 Running 0 10m
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl --namespace onap-vid logs -f vid-server-248645937-8tt6p
16-Jul-2017 02:46:48.707 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 22520 ms
tomcat comes up on 127.0.0.1:30200 for this colocated setup
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl get services --all-namespaces -o wide
onap-vid vid-mariadb None <none> 3306/TCP 1h app=vid-mariadb
onap-vid vid-server 10.43.14.244 <nodes> 8080:30200/TCP 1h app=vid-server
Michael O'Brien
Good news – 32 of 33 pods are up (sdnc-portal is going through a restart).
Ran 2 parallel Rancher systems on 48G Ubuntu 16.04.2 VM’s on two 64G servers
Stats: Without DCAE (which is up to 40% of ONAP) we run at 33G – so I would expect a full system to be around 50G which means we can run on a P70 Thinkpad laptop with 64G.
Had to add some dns-search domains for k8s in interfaces to appear in resolv.conf after running the config pod.
Issues:
after these 2 config changes the pods come up within 25 min except policy-drools which takes 45 min (on 1 machine but not the other) and sdnc-portal (which is having issues with some node downloads)
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl --namespace onap-sdnc logs -f sdnc-portal-3375812606-01s1d | grep ERR
npm ERR! fetch failed https://registry.npmjs.org/is-utf8/-/is-utf8-0.2.1.tgz
I’ll look at instantiating the vFirewall VM’s and integrating DCAE next.
on 5820k 4.1GHz 12 vCores 48g Ubuntu 16.04.2 VM on 64g host
root@obriensystemskub0:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system heapster-859001963-bmlff 1/1 Running 5 43m 10.42.143.118 obriensystemskub0
kube-system kube-dns-1759312207-0x1xx 3/3 Running 8 43m 10.42.246.144 obriensystemskub0
kube-system kubernetes-dashboard-2463885659-jl5jf 1/1 Running 5 43m 10.42.117.156 obriensystemskub0
kube-system monitoring-grafana-1177217109-7gkl6 1/1 Running 4 43m 10.42.79.40 obriensystemskub0
kube-system monitoring-influxdb-1954867534-8nr2q 1/1 Running 5 43m 10.42.146.215 obriensystemskub0
kube-system tiller-deploy-1933461550-w77c5 1/1 Running 4 43m 10.42.1.66 obriensystemskub0
onap-aai aai-service-301900780-wp3w1 1/1 Running 0 25m 10.42.104.101 obriensystemskub0
onap-aai hbase-2985919495-zfs2c 1/1 Running 0 25m 10.42.208.135 obriensystemskub0
onap-aai model-loader-service-2352751609-4qb0x 1/1 Running 0 25m 10.42.25.139 obriensystemskub0
onap-appc appc-4266112350-gscxh 1/1 Running 0 25m 10.42.90.128 obriensystemskub0
onap-appc appc-dbhost-981835105-lp6tn 1/1 Running 0 25m 10.42.201.58 obriensystemskub0
onap-appc appc-dgbuilder-939982213-41znl 1/1 Running 0 25m 10.42.30.127 obriensystemskub0
onap-message-router dmaap-1381770224-c5xp8 1/1 Running 0 25m 10.42.133.232 obriensystemskub0
onap-message-router global-kafka-3488253347-zt8x9 1/1 Running 0 25m 10.42.235.227 obriensystemskub0
onap-message-router zookeeper-3757672320-bxkvs 1/1 Running 0 25m 10.42.14.4 obriensystemskub0
onap-mso mariadb-2610811658-r22z9 1/1 Running 0 25m 10.42.46.110 obriensystemskub0
onap-mso mso-2217182437-1r8fm 1/1 Running 0 25m 10.42.120.204 obriensystemskub0
onap-policy brmsgw-554754608-gssf8 1/1 Running 0 25m 10.42.84.128 obriensystemskub0
onap-policy drools-1184532483-kg8sr 1/1 Running 0 25m 10.42.62.198 obriensystemskub0
onap-policy mariadb-546348828-1ck21 1/1 Running 0 25m 10.42.118.120 obriensystemskub0
onap-policy nexus-2933631225-s1qjz 1/1 Running 0 25m 10.42.73.217 obriensystemskub0
onap-policy pap-235069217-qdf2r 1/1 Running 0 25m 10.42.157.211 obriensystemskub0
onap-policy pdp-819476266-zvncc 1/1 Running 0 25m 10.42.38.47 obriensystemskub0
onap-policy pypdp-3646772508-n801j 1/1 Running 0 25m 10.42.244.206 obriensystemskub0
onap-portal portalapps-157357486-gjnnc 1/1 Running 0 25m 10.42.83.144 obriensystemskub0
onap-portal portaldb-351714684-1n956 1/1 Running 0 25m 10.42.8.80 obriensystemskub0
onap-portal vnc-portal-1027553126-h6dhd 1/1 Running 0 25m 10.42.129.60 obriensystemskub0
onap-robot robot-44708506-t10kk 1/1 Running 0 31m 10.42.185.118 obriensystemskub0
onap-sdc sdc-be-4018435632-3k6k2 1/1 Running 0 25m 10.42.246.193 obriensystemskub0
onap-sdc sdc-cs-2973656688-kktn8 1/1 Running 0 25m 10.42.240.176 obriensystemskub0
onap-sdc sdc-es-2628312921-bg0dg 1/1 Running 0 25m 10.42.67.214 obriensystemskub0
onap-sdc sdc-fe-4051669116-3b9bh 1/1 Running 0 25m 10.42.42.203 obriensystemskub0
onap-sdc sdc-kb-4011398457-fgpkl 1/1 Running 0 25m 10.42.47.218 obriensystemskub0
onap-sdnc sdnc-1672832555-1h4s7 1/1 Running 0 25m 10.42.120.148 obriensystemskub0
onap-sdnc sdnc-dbhost-2119410126-48mt9 1/1 Running 0 25m 10.42.133.166 obriensystemskub0
onap-sdnc sdnc-dgbuilder-730191098-gj6g9 1/1 Running 0 25m 10.42.154.99 obriensystemskub0
onap-sdnc sdnc-portal-3375812606-01s1d 0/1 Running 0 25m 10.42.105.164 obriensystemskub0
onap-vid vid-mariadb-1357170716-vnmhr 1/1 Running 0 28m 10.42.218.225 obriensystemskub0
onap-vid vid-server-248645937-m67r9 1/1 Running 0 28m 10.42.227.81 obriensystemskub0
nagaraja sr
Michael O'Brien - (deprecated as of 20170508) - use obrienlabs i've got to the point where i can access the portal login page, but after inputting the credentials, it keeps redirecting to port 8989 and fails instead of the external mapped port (30215 in my case) any thoughts ?
i'm running on GCE with 40GB and only running sdc, message-router and portal for now.
Michael O'Brien
Nagaraja, yes good question. I actually have been able to get the point of running portal - as the 1.0.0 system is pretty stable now
onap-portal portalapps 255.255.255.255 <nodes> 8006:30213/TCP,8010:30214/TCP,8989:30215/TCP 2h
I was recording a demo and ran into the same issue - I will raise a JIRA as we fix this and post here
http://portal.api.simpledemo.openecomp.org:30215/ECOMPPORTAL/login.htm
redirects to
Request URL:
http://portal.api.simpledemo.openecomp.org:8989/ECOMPPORTAL/applicationsHome
because of hardcoded parameters like the following in the DockerFile
Eddy Hautot
Hello, was it a workaround to this finally?
I ran the OOM installation from scratch and managed to logged to Portal by changing back the port to 30215 after the redirection of the login.
Also when i logged in with cs0008 user and click on SDC, i have: "can’t establish a connection to the server at sdc.api.simpledemo.onap.org:8181" (should be changed to port 30206?)
Do you know which config has to be changed for this?
Thank you
Mike Elliott
Are you accessing the ECOMP Portal via the 'onap-portal vnc-portal-1027553126-h6dhd' container?
This container was added to the standard ONAP deployment so one may VNC into the ONAP Deployment instance (namespace) and have networking resolved fully resolved within K8s.
Michael O'Brien
Mike, Was just writing a question to you - yes looks like I am using the wrong container - reworking now
thank you
Michael O'Brien
Nagaraga,
Portal access via the vnc-portal container (port 30211) is documented above now in
RunningONAPusingthevnc-portal
/michael
Vaibhav Chopra
Hi all,
I am new to this kubernetes installation of ONAP and installing ONAP component 1 by 1 on My VM (due to memory constraint)
I want to see if the PODs are working fine
I launched robot component:-
onap-robot robot-1597903591-1tx35 1/1 Running 0 2h 10.42.104.187 localhost
and logged in to same via
kubectl -n onap-robot exec -it robot-1597903591-1tx35 /bin/bash
Now do I need to mount some directory to see the containers and How docker process will run in same.
BR/
VC
Vaibhav Chopra
Docker process are not running by own may be due to proxy internet being used. Trying running manually the install and setup by logging to each component.
Michael O'Brien
Vaibhav,
Hi, there are a combination of files - some are in the container itself - see /var/opt
some are off the shared file system on the host - see /dockerdata-nfs
In the case of robot - you have spun up one pod - each pod has a single docker container, to see the other pods/containers - kubectl into each like you have into robot - just change the pod name. kubectl is an abstraction on top of docker - so you don't need to directly access docker containers.
/michael
Geora Barsky
Vaibhav, if you are trying to see the status of the pod or look at the log file, you can do it also through Rancher / Kubernetes dashboard :
Vaibhav Chopra
Hi Michael,
Yes, I can see the mounted directories and found robot_install.sh in /var/opt/OpenECOMP_ETE/demo/boot
On K8s Dashboard and CLI, the POD is in running state but when I logged in (via kubectl) any of them, I am unable to see any docker process running via docker ps. (Even docker itself is not installed)
I think this Ideally is taken care by POD itself right or do we need to go inside each component and run the installation script of that specific.
BR/
VC
Michael O'Brien
Vaibhav, Hi, the architecture of kubernetes is such that it manages docker containers - we are not running docker on docker. Docker ps will only be possible on the host machine(s)/vm(s) that kubernetes is running on - you will see the wrapper docker containers running the kubernetes and rancher undercloud.
When you "kubectl exec -it" - into a pod you have entered a docker container the same as a "docker exec -it" at that point you are in a container process, try doing a "ps -ef | grep java" to see if a java process is running for example. Note that by the nature of docker most containers will have a minimal linux install - so some do not include the ps command for example.
If you check the instructions above you will see the first step is to install docker 1.12 only on the host - as you end up with 1 or more hosts running a set of docker containers after ./createAll.bash finishes
example - try the mso jboss container - it is one of the heavyweight containers
root@ip-172-31-93-122:~# kubectl -n onap-mso exec -it mso-371905462-w0mcj bash
root@mso-371905462-w0mcj:/# ps -ef | grep java
root 1920 1844 0 Aug27 ? 00:28:33 java -D[Standalone] -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Xms64m -Xmx4g -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=1g -Djboss.bind.address=0.0.0.0 -Djboss.bind.address.management=0.0.0.0 -Dmso.db=MARIADB -Dmso.config.path=/etc/mso/config.d/ -Dorg.jboss.boot.log.file=/opt/jboss/standalone/log/server.log -Dlogging.configuration=file:/opt/jboss/standalone/configuration/logging.properties -jar /opt/jboss/jboss-modules.jar -mp /opt/jboss/modules org.jboss.as.standalone -Djboss.home.dir=/opt/jboss -Djboss.server.base.dir=/opt/jboss/standalone -c standalone-full-ha-mso.xml
if you want to see the k8s wrapped containers - do a docker ps on the host
root@ip-172-31-93-122:~# docker ps | grep mso
9fed2b7ebd1d nexus3.onap.org:10001/openecomp/mso@sha256:ab3a447956577a0f339751fb63cc2659e58b9f5290852a90f09f7ed426835abe "/docker-files/script" 4 days ago Up 4 days k8s_mso_mso-371905462-w0mcj_onap-mso_11da22bf-8b3d-11e7-9e1a-0289899d0a5f_0
e4171a2b73d8 nexus3.onap.org:10001/mariadb@sha256:3821f92155bf4311a59b7ec6219b79cbf9a42c75805000a7c8fe5d9f3ad28276 "/docker-entrypoint.s" 4 days ago Up 4 days k8s_mariadb_mariadb-786536066-87g9d_onap-mso_11bc6958-8b3d-11e7-9e1a-0289899d0a5f_0
8ba86442fbde gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 days ago Up 4 days k8s_POD_mso-371905462-w0mcj_onap-mso_11da22bf-8b3d-11e7-9e1a-0289899d0a5f_0
f099c5613bf1 gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 days ago Up 4 days k8s_POD_mariadb-786536066-87g9d_onap-mso_11bc6958-8b3d-11e7-9e1a-0289899d0a5f_0
Cyril Nleng
Hi all,
I am new to kubernetes installation of ONAP and have problems cloning onap repository.
I have tried git clone -b release-1.0.0 http://gerrit.onap.org/r/oom
but ended up with the following error
fatal: unable to access 'http://gerrit.onap.org/r/oom/': The requested URL returned error: 403
I also tried to use ssh git clone -b release-1.0.0 ssh://cnleng@gerrit.onap.org:29418/oom
but I cannot access settings on https://gerrit.onap.org (Already have an account on Linux foundation) to copy my ssh keys
Any help will be appreciated.
Thanks
Michael O'Brien
403 in your case might be due to your proxy or firewall - check access away from your company if possible
Verified the URL
root@ip-172-31-90-90:~/test# git clone -b release-1.0.0 http://gerrit.onap.org/r/oom
Cloning into 'oom'...
remote: Counting objects: 896, done
remote: Finding sources: 100% (262/262)
remote: Total 1701 (delta 96), reused 1667 (delta 96)
Receiving objects: 100% (1701/1701), 1.08 MiB | 811.00 KiB/s, done.
Resolving deltas: 100% (588/588), done.
Checking connectivity... done.
If you login to gerrit and navigate to the oom directory, it will supply you with anon, https and ssl urls - try each of them they should work.
Geora Barsky
Hi, I am trying to install ONAP components though oom, but getting the following errors:
Search Line limits were exceeded, some dns names have been omitted, the applied search line is: onap-appc.svc.cluster.local svc.cluster.local cluster.local kubelet.kubernetes.rancher.internal kubernetes.rancher.internal rancher.internal
I tried to edit /etc/resolve.conf according to Michael's comment above:
nameserver <server ip>
search localdomain service.ns.svc.cluster.local
but it does not seem helps
Please advise how to resolve this DNS issue
Thanks
Geora
Michael O'Brien
Geora, hi, that is a red herring unfortunately - there is a bug in rancher where they add more than 5 domains to the search tree - you can ignore these - the resolve.conf turns out to have no effect - it is removed except in the comment history
https://github.com/rancher/rancher/issues/9303
/michael
Michael O'Brien
todo: update table/diagram on aai for 1.1 coming in
root@obriensystemskub0:~/11/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
default config-init 0/1 Completed 0 45d
kube-system heapster-859001963-kz210 1/1 Running 5 46d
kube-system kube-dns-1759312207-jd5tf 3/3 Running 8 46d
kube-system kubernetes-dashboard-2463885659-xv986 1/1 Running 4 46d
kube-system monitoring-grafana-1177217109-sm5nq 1/1 Running 4 46d
kube-system monitoring-influxdb-1954867534-vvb84 1/1 Running 4 46d
kube-system tiller-deploy-1933461550-gdxch 1/1 Running 4 46d
onap config-init 0/1 Completed 0 1h
onap-aai aai-dmaap-2612279050-g4qjj 1/1 Running 0 1h
onap-aai aai-kafka-3336540298-kshzc 1/1 Running 0 1h
onap-aai aai-resources-2582573456-n1v1q 0/1 CrashLoopBackOff 9 1h
onap-aai aai-service-3847504356-03rk2 0/1 Init:0/1 3 1h
onap-aai aai-traversal-1020522763-njrw7 0/1 Completed 10 1h
onap-aai aai-zookeeper-3839400401-160pk 1/1 Running 0 1h
onap-aai data-router-1134329636-f5g2j 1/1 Running 0 1h
onap-aai elasticsearch-2888468814-4pmgd 1/1 Running 0 1h
onap-aai gremlin-1948549042-j56p9 0/1 CrashLoopBackOff 7 1h
onap-aai hbase-1088118705-f29c1 1/1 Running 0 1h
onap-aai model-loader-service-784161734-3njbr 1/1 Running 0 1h
onap-aai search-data-service-237180539-0sj6c 1/1 Running 0 1h
onap-aai sparky-be-3826115676-c2wls 1/1 Running 0 1h
onap-appc appc-2493901092-041m9 1/1 Running 0 1h
onap-appc appc-dbhost-3869943665-5d0vb 1/1 Running 0 1h
onap-appc appc-dgbuilder-2279934547-t2qqx 0/1 Running 1 1h
onap-message-router dmaap-3009751734-w59nn 1/1 Running 0 1h
onap-message-router global-kafka-1350602254-f8vj6 1/1 Running 0 1h
onap-message-router zookeeper-2151387536-sw7bn 1/1 Running 0 1h
onap-mso mariadb-3820739445-qjrmn 1/1 Running 0 1h
onap-mso mso-278039889-4379l 1/1 Running 0 1h
onap-policy brmsgw-1958800448-p855b 1/1 Running 0 1h
onap-policy drools-3844182126-31hmg 0/1 Running 0 1h
onap-policy mariadb-2047126225-4hpdb 1/1 Running 0 1h
onap-policy nexus-851489966-h1l4b 1/1 Running 0 1h
onap-policy pap-2713970993-kgssq 1/1 Running 0 1h
onap-policy pdp-3122086202-dqfz6 1/1 Running 0 1h
onap-policy pypdp-1774542636-vp3tt 1/1 Running 0 1h
onap-portal portalapps-2603614056-4030t 1/1 Running 0 1h
onap-portal portaldb-122537869-8h4hd 1/1 Running 0 1h
onap-portal portalwidgets-3462939811-9rwtl 1/1 Running 0 1h
onap-portal vnc-portal-2396634521-7zlvf 0/1 Init:2/5 3 1h
onap-robot robot-2697244605-cbkzp 1/1 Running 0 1h
onap-sdc sdc-be-2266987346-r321s 0/1 Running 0 1h
onap-sdc sdc-cs-1003908407-46k1q 1/1 Running 0 1h
onap-sdc sdc-es-640345632-7ldhv 1/1 Running 0 1h
onap-sdc sdc-fe-783913977-ccg59 0/1 Init:0/1 3 1h
onap-sdc sdc-kb-1525226917-j2n48 1/1 Running 0 1h
onap-sdnc sdnc-2490795740-pfwdz 1/1 Running 0 1h
onap-sdnc sdnc-dbhost-2647239646-5spg0 1/1 Running 0 1h
onap-sdnc sdnc-dgbuilder-1138876857-1b40z 0/1 Running 0 1h
onap-sdnc sdnc-portal-3897220020-0tt9t 0/1 Running 1 1h
onap-vid vid-mariadb-2479414751-n33qf 1/1 Running 0 1h
onap-vid vid-server-1654857885-jd1jc 1/1 Running 0 1h
20170902 update - everything up (minus to-be-merged-dcae)
root@ip-172-31-93-160:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config-init 0/1 Completed 0 21m
onap-aai aai-service-3321436576-2snd6 0/1 PodInitializing 0 18m
onap-policy drools-3066421234-rbpr9 0/1 Init:0/1 1 18m
onap-portal vnc-portal-700404418-r61hm 0/1 Init:2/5 1 18m
onap-sdc sdc-fe-3467675014-v8jxm 0/1 Running 0 18m
root@ip-172-31-93-160:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces | grep 0/1
root@ip-172-31-93-160:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-7wdct 1/1 Running 0 1d
kube-system kube-dns-2514474280-kmd6v 3/3 Running 3 1d
kube-system kubernetes-dashboard-716739405-xxn5k 1/1 Running 0 1d
kube-system monitoring-grafana-3552275057-hvfw8 1/1 Running 0 1d
kube-system monitoring-influxdb-4110454889-7s5fj 1/1 Running 0 1d
kube-system tiller-deploy-737598192-jpggg 1/1 Running 0 1d
onap-aai aai-dmaap-522748218-5rw0v 1/1 Running 0 21m
onap-aai aai-kafka-2485280328-6264m 1/1 Running 0 21m
onap-aai aai-resources-3302599602-fn4xm 1/1 Running 0 21m
onap-aai aai-service-3321436576-2snd6 1/1 Running 0 21m
onap-aai aai-traversal-2747464563-3c8m7 1/1 Running 0 21m
onap-aai aai-zookeeper-1010977228-l2h3h 1/1 Running 0 21m
onap-aai data-router-1397019010-t60wm 1/1 Running 0 21m
onap-aai elasticsearch-2660384851-k4txd 1/1 Running 0 21m
onap-aai gremlin-1786175088-m39jb 1/1 Running 0 21m
onap-aai hbase-3880914143-vp8zk 1/1 Running 0 21m
onap-aai model-loader-service-226363973-wx6s3 1/1 Running 0 21m
onap-aai search-data-service-1212351515-q4k68 1/1 Running 0 21m
onap-aai sparky-be-2088640323-h2pbx 1/1 Running 0 21m
onap-appc appc-1972362106-4zqh8 1/1 Running 0 21m
onap-appc appc-dbhost-2280647936-s041d 1/1 Running 0 21m
onap-appc appc-dgbuilder-2616852186-g9sng 1/1 Running 0 21m
onap-message-router dmaap-3565545912-w5lp4 1/1 Running 0 21m
onap-message-router global-kafka-701218468-091rt 1/1 Running 0 21m
onap-message-router zookeeper-555686225-vdp8w 1/1 Running 0 21m
onap-mso mariadb-2814112212-zs7lk 1/1 Running 0 21m
onap-mso mso-2505152907-xdhmb 1/1 Running 0 21m
onap-policy brmsgw-362208961-ks6jb 1/1 Running 0 21m
onap-policy drools-3066421234-rbpr9 1/1 Running 0 21m
onap-policy mariadb-2520934092-3jcw3 1/1 Running 0 21m
onap-policy nexus-3248078429-4k29f 1/1 Running 0 21m
onap-policy pap-4199568361-p3h0p 1/1 Running 0 21m
onap-policy pdp-785329082-3c8m5 1/1 Running 0 21m
onap-policy pypdp-3381312488-q2z8t 1/1 Running 0 21m
onap-portal portalapps-2799319019-00qhb 1/1 Running 0 21m
onap-portal portaldb-1564561994-50mv0 1/1 Running 0 21m
onap-portal portalwidgets-1728801515-r825g 1/1 Running 0 21m
onap-portal vnc-portal-700404418-r61hm 1/1 Running 0 21m
onap-robot robot-349535534-lqsvp 1/1 Running 0 21m
onap-sdc sdc-be-1839962017-n3hx3 1/1 Running 0 21m
onap-sdc sdc-cs-2640808243-tc9ck 1/1 Running 0 21m
onap-sdc sdc-es-227943957-f6nfv 1/1 Running 0 21m
onap-sdc sdc-fe-3467675014-v8jxm 1/1 Running 0 21m
onap-sdc sdc-kb-1998598941-57nj1 1/1 Running 0 21m
onap-sdnc sdnc-250717546-xmrmw 1/1 Running 0 21m
onap-sdnc sdnc-dbhost-3807967487-tdr91 1/1 Running 0 21m
onap-sdnc sdnc-dgbuilder-3446959187-dn07m 1/1 Running 0 21m
onap-sdnc sdnc-portal-4253352894-hx9v8 1/1 Running 0 21m
onap-vid vid-mariadb-2932072366-n5qw1 1/1 Running 0 21m
onap-vid vid-server-377438368-kn6x4 1/1 Running 0 21m
root@ip-172-31-93-160:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces | grep 0/1
health passes except for to-be-merged dcae
root@ip-172-31-93-160:/dockerdata-nfs/onap/robot# ls
authorization demo-docker.sh demo-k8s.sh ete-docker.sh ete-k8s.sh eteshare robot
root@ip-172-31-93-160:/dockerdata-nfs/onap/robot# ./ete-docker.sh health
------------------------------------------------------------------------------
Basic SDNGC Health Check | PASS |
------------------------------------------------------------------------------
Basic A&AI Health Check | PASS |
------------------------------------------------------------------------------
Basic Policy Health Check | PASS |
------------------------------------------------------------------------------
Basic MSO Health Check | PASS |
------------------------------------------------------------------------------
Basic ASDC Health Check | PASS |
------------------------------------------------------------------------------
Basic APPC Health Check | PASS |
------------------------------------------------------------------------------
Basic Portal Health Check | PASS |
------------------------------------------------------------------------------
Basic Message Router Health Check | PASS |
------------------------------------------------------------------------------
Basic VID Health Check | PASS |
nagaraja sr
Has anyone managed to run ONAP on Kubernetes with more than one node? i'm unclear about how the /dockerdata-nfs volume mount works in the case of multiple nodes.
1) in my azure setup, i have one master node and 4 agent nodes (Standard D3 - 4CPU/ 14GB). after running the config-init pod (and completing) i do not see the /dockerdata-nfs directory being created on the master node. i am not sure how to check this directory on all the agent nodes. Is this directory expected to be created on all the agent nodes? if so, are they kept synchronized?
2) after the cluster is restarted/ there is a possibility that pods will run on different set of nodes, so if the /dockerdata-nfs is not kept in sync between the agent nodes, then the data will not be persisted.
ps: i did not use rancher. i created the k8s cluster using acs-engine.
Shane Daniel
Hi nagaraja,
The mounting of the shared dockerdata-nfs volume does not appear to happen automatically. You can install nfs-kernel-server and mount a shared drive manually. If you are running rancher on the master node (the one with the files in the /dockerdata-nfs directory, mount that directory to the agent nodes:
On Master:
# apt-get install nfs-kernel-server
Modify /etc/exports to share directory from master to agent nodes
# vi /etc/exports
#systemctl restart nfs-kernel-server
On client nodes:
#apt-get install nfs-common
delete existing data:
#rm -fr dockerdata-nfs/
#mkdir -p /dockerdata-nfs
#mount <master ip>:/dockerdata-nfs/ /dockerdata-nfs/
Cyril Nleng
Hi All,
I am trying to install ONAP on Kubernetes and I got the following error while trying to run ./createConfig.sh -n onap command:
sudo: unable to execute ./createConfig.sh: No such file or directory
Hangup
Does anyone have an idea? (kubernetes /helm is already up and running)
Thanks,
Borislav Glozman
Please check whether the file is not in DOS format. you might want to do dos2unix on it (and others)
Cyril Nleng
Thank you for your help. Indeed this was the cause of the problem.
Michael O'Brien
we need to 755 the file - it was committed with the wrong permissions to the 1.0.0 branch
OOM-218 - Getting issue details... STATUS
the instructions reference this.
% chmod
777
createConfig.sh (
1.0
branch only)
Cyril Nleng
Hi All,
I am trying to install ONAP on Kubernetes and I got the following error while trying to run ./createAll.bash -n onap -a robot|appc|aai command:
Command 'mppc' from package 'makepp' (universe)
Command 'ppc' from package 'pearpc' (universe)
appc: command not found
No command 'aai' found, did you mean:
Command 'axi' from package 'afnix' (universe)
Command 'ali' from package 'nmh' (universe)
Command 'ali' from package 'mailutils-mh' (universe)
Command 'aa' from package 'astronomical-almanac' (universe)
Command 'fai' from package 'fai-client' (universe)
Command 'cai' from package 'emboss' (universe)
aai: command not found
Does anyone have an idea? (kubernetes /helm is already up and running)
Thanks,
nagaraja sr
you need to run the commands for each onap command one by one.
i.e, ./createAll.bash -n onap -a robot
when that's completed,
./createAll.bash -n onap -a aai
./createAll.bash -n onap -a appc
and so on for each onap component you wish to install.
Cyril Nleng
Thanks for the help,
but right now it looks like Kubernetes is not able to pull an image from registry
kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
onap-robot robot-3494393958-8fl0q 0/1 ImagePullBackOff 0 5m
Do you have any idea why?
Michael O'Brien
There was an issue (happens periodically) with the nexus3 repo.
Also check that you are not having proxy issues.
Usually we post the ONAP partner we are with either via our email or on our profile - thank you in advance.
/michael
Cyril Nleng
Hi All,
I am trying to install ONAP on Kubernetes and I got the following behaviorwhile trying to run ./createAll.bash -n onap -a robot|appc|aai command:
but right now it looks like Kubernetes is not able to pull an image from registry
kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
onap-robot robot-3494393958-8fl0q 0/1 ImagePullBackOff 0 5m
Do you have any idea why?
Alex Lee
Hi, Michael O'Brien .I am trying to install ONAP through the way above and encountered a problem.
The pod of hbase in kubernetes returns to “Readiness probe failed: dial tcp 10.42.76.162:8020: getsockopt: connection refused”. It seems like the service of hbase is not started as expected.The container named hbase in Rancher logs:
Starting namenodes on [hbase]
hbase: chown: missing operand after '/opt/hadoop-2.7.2/logs'
hbase: Try 'chown --help' for more information.
hbase: starting namenode, logging to /opt/hadoop-2.7.2/logs/hadoop--namenode-hbase.out
localhost: starting datanode, logging to /opt/hadoop-2.7.2/logs/hadoop--datanode-hbase.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.2/logs/hadoop--secondarynamenode-hbase.out
starting zookeeper, logging to /opt/hbase-1.2.3/bin/../logs/hbase--zookeeper-hbase.out
starting master, logging to /opt/hbase-1.2.3/bin/../logs/hbase--master-hbase.out
starting regionserver, logging to /opt/hbase-1.2.3/bin/../logs/hbase--1-regionserver-hbase.out
Michael O'Brien
Nexus3 usually has intermittent connection issues - you may have to wait up until 30 min. Yesterday I was able to bring it up on 3 systems with the 20170906 tag (All outside the firewall)
I assume MSO (earlier in the startup) worked - so you don't have a proxy issue
/michael
Michael O'Brien
verified
root@ip-172-31-93-122:~/oom_20170908/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-q5vns 1/1 Running 3 12d
kube-system kube-dns-646531078-tzhbj 3/3 Running 6 12d
kube-system kubernetes-dashboard-716739405-zc56m 1/1 Running 3 12d
kube-system monitoring-grafana-3552275057-gwcv0 1/1 Running 3 12d
kube-system monitoring-influxdb-4110454889-m29w3 1/1 Running 3 12d
kube-system tiller-deploy-737598192-rndtq 1/1 Running 3 12d
onap config 0/1 Completed 0 10m
onap-aai aai-resources-3302599602-6mggg 1/1 Running 0 7m
onap-aai aai-service-3321436576-qc7tx 1/1 Running 0 7m
onap-aai aai-traversal-2747464563-bvbqn 1/1 Running 0 7m
onap-aai data-router-1397019010-d4bh1 1/1 Running 0 7m
onap-aai elasticsearch-2660384851-r9v3k 1/1 Running 0 7m
onap-aai gremlin-1786175088-q5z1k 1/1 Running 1 7m
onap-aai hbase-3880914143-0nn8x 1/1 Running 0 7m
onap-aai model-loader-service-226363973-2wr0k 1/1 Running 0 7m
onap-aai search-data-service-1212351515-b04rz 1/1 Running 0 7m
onap-aai sparky-be-2088640323-kg4ts 1/1 Running 0 7m
onap-appc appc-1972362106-j27bp 1/1 Running 0 7m
onap-appc appc-dbhost-4156477017-13mhs 1/1 Running 0 7m
onap-appc appc-dgbuilder-2616852186-4rtxz 1/1 Running 0 7m
onap-message-router dmaap-3565545912-nqcs1 1/1 Running 0 8m
onap-message-router global-kafka-3548877108-x4gqb 1/1 Running 0 8m
onap-message-router zookeeper-2697330950-6l8ht 1/1 Running 0 8m
onap-mso mariadb-2019543522-1jc0v 1/1 Running 0 8m
onap-mso mso-2505152907-cj74x 1/1 Running 0 8m
onap-policy brmsgw-3913376880-5v5p4 1/1 Running 0 7m
onap-policy drools-873246297-1h059 1/1 Running 0 7m
onap-policy mariadb-922099840-qbpj7 1/1 Running 0 7m
onap-policy nexus-2268491532-pqt8t 1/1 Running 0 7m
onap-policy pap-1694585402-7mdtg 1/1 Running 0 7m
onap-policy pdp-3638368335-zptqk 1/1 Running 0 7m
onap-portal portalapps-2799319019-twhn2 1/1 Running 0 8m
onap-portal portaldb-2714869748-bt1c8 1/1 Running 0 8m
onap-portal portalwidgets-1728801515-gr616 1/1 Running 0 8m
onap-portal vnc-portal-1920917086-s9mj9 1/1 Running 0 8m
onap-robot robot-1085296500-jkkln 1/1 Running 0 8m
onap-sdc sdc-be-1839962017-nh4bm 1/1 Running 0 7m
onap-sdc sdc-cs-428962321-hhnmk 1/1 Running 0 7m
onap-sdc sdc-es-227943957-mrnng 1/1 Running 0 7m
onap-sdc sdc-fe-3467675014-nq72v 1/1 Running 0 7m
onap-sdc sdc-kb-1998598941-2bd73 1/1 Running 0 7m
onap-sdnc sdnc-250717546-0dtr7 1/1 Running 0 8m
onap-sdnc sdnc-dbhost-2348786256-96gvr 1/1 Running 0 8m
onap-sdnc sdnc-dgbuilder-3446959187-9993t 1/1 Running 0 8m
onap-sdnc sdnc-portal-4253352894-sd7mg 1/1 Running 0 8m
onap-vid vid-mariadb-2940400992-mmtbn 1/1 Running 0 8m
onap-vid vid-server-377438368-z3tfv 1/1 Running 0 8m
From: onap-discuss-bounces@lists.onap.org [mailto:onap-discuss-bounces@lists.onap.org] On Behalf Of Mandeep Khinda
Sent: Friday, September 8, 2017 14:36
To: onap-discuss@lists.onap.org
Subject: [onap-discuss] [oom] config pod changes
OOM users,
I’ve just pushed a change that requires a re-build of the /dockerdata-nfs/onap/ mount on your K8s host.
Basically, what I’ve tried to do is port over the heat stack version of ONAPs configuration mechanism. The heat way of running ONAP writes files to /opt/config/ based on the stack’s environment file that has the details related to each users environment. These values are then swapped in to the various VMs containers using scripts.
Now that we are using helm for OOM, I was able to do something similar in order to start trying to run the vFW/vLB demo use cases.
This story tracks the functionality that was needed: https://jira.onap.org/browse/OOM-277
I have also been made aware that this change requires K8s 1.6 as I am making use of the “envFrom” https://kubernetes.io/docs/api-reference/v1.6/#container-v1-core. We stated earlier that we are setting minimum requirements of K8s 1.7 and rancher 1.6 for OOM so hopefully this isn’t a big issue.
It boils down to this:
/oom/kubernetes/config/onap-parameters.yaml is kind of like file “onap_openstackRC.env” and you will need to define some required values otherwise the config pod deployment will fail.
A sample can be found here:
/oom/kubernetes/config/onap-parameters-sample.yaml
Note: If you don’t care about interacting with openstack to launch VNFs then, you can just use the sample file contents.
continue to run createConfig.sh –n onap and it will install the config files and swap in your environment specific values before it completes.
createAll.bash –n onap to recreate your ONAP K8s environment and go from there.
Thx,
Mandeep
--
Liang Ke
Hi, ALL
1? I am trying to install ONAP on Kubernetes and encountered a problem.
I create msb pods first by command "./createAll.bash -n onap -a msb", then
create aai pods by command "/createAll.bash -n onap -a aai".
The problem is that all serviceName and url of aai do not register to msb as expected.
I find the code of aai project has those lines "
msb.onap.org/service-info: '[
so I think msb can not support domain name right now?
2? Also three of aai pods can not be created normally.
Sathvik Manoj
Hi all,
Goal: I want to deploy and manage vFirewall router using ONAP.
I installed ONAP on Kubernetes using oom(release-1.0.0). All Services are running except DCAE as it is not yet completely implemented in Kubernetes. Also, I have an OpenStack cluster configured separately.
How can I integrate DCAE to the above Kubernetes cluster?
Thanks,
Sathvik M
Michael O'Brien
DCAE is still coming in (1.0 version in 1.1) - this component is an order of magnitude more complex than any other ONAP deployment - you can track
https://jira.onap.org/browse/OOM-176
Michael O'Brien
DCAE is in OOM Kubernetes as of 20170913
onap-dcae cdap0-4078069992-ql1fk 1/1 Running 0 41m
onap-dcae cdap1-4039904165-r8f2v 1/1 Running 0 41m
onap-dcae cdap2-422364317-827g3 1/1 Running 0 41m
onap-dcae dcae-collector-common-event-1149898616-1f8vt 1/1 Running 0 41m
onap-dcae dcae-collector-dmaapbc-1520987080-9drlt 1/1 Running 0 41m
onap-dcae dcae-controller-2121147148-1kd7f 1/1 Running 0 41m
onap-dcae dcae-pgaas-2006588677-0wlf1 1/1 Running 0 41m
onap-dcae dmaap-1927146826-6wt83 1/1 Running 0 41m
onap-dcae kafka-2590900334-29qsk 1/1 Running 0 41m
onap-dcae zookeeper-2166102094-4jgw0 1/1 Running 0 41m
Sathvik Manoj
That means DCAE is working... Is it available in 1.0 version of OOM or 1.1?
Thanks,
Sathvik M
Sathvik Manoj
Hi Michael,
As DCAE is available in OOM 1.1v, I started installtion of 1.1v. Out of 10 containers of A&AI 2 of them are not coming up.
RepetedlyI am seeing below prints in
Can some one help me in fixing this issue.
Thanks,
Sathvik M
Vidhu Shekhar Pandey
Hi Michael,
I am using OOM 1.1.0 version. I have pre pulled all the images using the prepull_docker.sh. But after creating the pods using createAll.sh script all the pods are coming up except DCAE. Is DCAE supported in 1.1.0 release? If not then when is it expected to be functional? Will I be able to run the vFW demo close loop without DCAE?
More details below:
The DCAE specific images shown are:
root@hcl:~# docker images | grep dcae
nexus3.onap.org:10001/openecomp/dcae-controller 1.1-STAGING-latest ff839a80b8f1 12 weeks ago 694.6 MB
nexus3.onap.org:10001/openecomp/dcae-collector-common-event 1.1-STAGING-latest e3daaf41111b 12 weeks ago 537.3 MB
nexus3.onap.org:10001/openecomp/dcae-dmaapbc 1.1-STAGING-latest 1fcf5b48d63b 7 months ago 328.1 MB
The DCAE health check is failing
Starting Xvfb on display :88 with res 1280x1024x24
Executing robot tests at log level TRACE
==============================================================================
OpenECOMP ETE
==============================================================================
OpenECOMP ETE.Robot
==============================================================================
OpenECOMP ETE.Robot.Testsuites
==============================================================================
OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp components are...
==============================================================================
Basic DCAE Health Check | FAIL |
ConnectionError: HTTPConnectionPool(host='dcae-controller.onap-dcae', port=8080): Max retries exceeded with url: /healthcheck (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f26aee31550>: Failed to establish a new connection: [Errno -2] Name or service not known',))
------------------------------------------------------------------------------
Basic SDNGC Health Check | PASS |
------------------------------------------------------------------------------
Basic A&AI Health Check | PASS |
Thanks,
Vidhu
Michael O'Brien
Vidhu, hi, DCAE was in in 1.0 of OOM on 28 Sept 2017 - however for R1/Amsterdam the new project DCAEGEN2 was only done in HEAT. There is an effort to move the containers to Kubernetes, an effort to use the developer setup with 1 instead of 7 cdap hadoop nodes and an effort to complete the bridge between the hybrid HEAT/Kubernetes setup - specific only to DCAEGEN2. One or more of these should be in shortly as we work the DCAE team. You are welcome to help both teams with this large effort.
thank you
/michael
I Chen
Hi Michael,
Just curious, is DCAEGEN2 now available?
While oneclick/createAll.bash includes DCAEGEN2 pod creation, the automation script cd.sh hits the ERROR condition when creating DCAEGEN2 because createAll.bash expect /home/ubuntu/.ssh/onap_rsa to exist. Here's some output from one of today's Jenkin's run console log (http://jenkins.onap.info/job/oom-cd/1853/consoleFull):
Michael O'Brien
Yes, DCAEGEN2 works via OOM- I verified it last friday. However only in the amsterdam release with the proper onap-parameters.yaml (will be ported to Beijing/master shortly).
see details on
https://lists.onap.org/pipermail/onap-discuss/2018-February/008059.html
The CD jenkins job is running master for now - where DCAEGEN2 is expected not to work yet.
try amsterdam.
/michael
Mor Dabastany
Hi,
I sense that there is a bit lack of information here. which, I would be happy to acquire.
There is a file that describes the onap environment, "onap-parameters.yaml". I think that it will good practice to provide data on how to fill it (or acquire the values that should be resides in it).
Michael O'Brien, any available document about it?
Michael O'Brien
Mor, You are welcome to help us finish the documentation for OOM-277
The config was changed on friday - those us here are playing catch up on some of the infrastructure changes as we are testing the deploys every couple days - you are welcome to add to the documentation here - usually the first to encounter an issue/workaround documents it - so the rest of us can benefit.
Most of the content on this tutorial is added by developers like yourself that would like to get OOM deployed and fully functional - at ONAP we self document anything that is missing
OOM-277 - Getting issue details... STATUS
There was a section added on friday for those switching from the old-style config to the new - you run a helm purge
The configuration parameters will be specific to your rackspace/openstack config - usually you match your rc export. There is a sample posted from before when it was in the json file in mso - see the screen cap.
The major issue is than so far no one using pure public ONAP has actually deployed a vFirewall yet (mostly due to stability issues with ONAP that are being fixed)
./michael
Michael O'Brien
TODO
good to go : 20170913:2200h
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 37m
onap-aai aai-service-3321436576-790w2 0/1 Init:0/1 1 34m
onap-aai aai-traversal-2747464563-pb8ns 0/1 Running 0 34m
onap-appc appc-dgbuilder-2616852186-htwkl 0/1 Running 0 35m
onap-dcae dmaap-1927146826-6wt83 0/1 Running 0 34m
onap-policy brmsgw-3913376880-qznzv 0/1 Init:0/1 1 35m
onap-policy drools-873246297-twxtq 0/1 Init:0/1 1 35m
onap-policy pap-1694585402-hwkdk 0/1 PodInitializing 0 35m
onap-policy pdp-3638368335-l00br 0/1 Init:0/1 1 35m
onap-portal vnc-portal-1920917086-0q786 0/1 Init:1/5 1 35m
onap-sdc sdc-be-1839962017-16zc3 0/1 Init:0/2 1 34m
onap-sdc sdc-fe-3467675014-qp7f5 0/1 Init:0/1 1 34m
onap-sdc sdc-kb-1998598941-6z0w2 0/1 PodInitializing 0 34m
onap-sdnc sdnc-dgbuilder-3446959187-lspd6 0/1 Running 0 35m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 39m
onap-policy brmsgw-3913376880-qznzv 0/1 Init:0/1 1 36m
onap-policy drools-873246297-twxtq 0/1 Init:0/1 1 36m
onap-policy pdp-3638368335-l00br 0/1 PodInitializing 0 36m
onap-portal vnc-portal-1920917086-0q786 0/1 Init:2/5 1 36m
onap-sdc sdc-be-1839962017-16zc3 0/1 PodInitializing 0 36m
onap-sdc sdc-fe-3467675014-qp7f5 0/1 Init:0/1 1 36m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 40m
onap-policy drools-873246297-twxtq 0/1 PodInitializing 0 38m
onap-portal vnc-portal-1920917086-0q786 0/1 Init:2/5 1 38m
onap-sdc sdc-fe-3467675014-qp7f5 0/1 Running 0 38m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 41m
onap-policy drools-873246297-twxtq 0/1 PodInitializing 0 39m
onap-portal vnc-portal-1920917086-0q786 0/1 Init:3/5 1 39m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 42m
onap-policy drools-873246297-twxtq 0/1 Running 0 40m
onap-portal vnc-portal-1920917086-0q786 0/1 PodInitializing 0 40m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 42m
onap-portal vnc-portal-1920917086-0q786 0/1 PodInitializing 0 40m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 43m
onap-portal vnc-portal-1920917086-0q786 0/1 PodInitializing 0 40m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a | grep 0/1
onap config 0/1 Completed 0 43m
root@ip-172-31-57-55:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-7212s 1/1 Running 1 1d
kube-system kube-dns-2514474280-lmr1k 3/3 Running 3 1d
kube-system kubernetes-dashboard-716739405-qfjgd 1/1 Running 1 1d
kube-system monitoring-grafana-3552275057-gj3x8 1/1 Running 1 1d
kube-system monitoring-influxdb-4110454889-2dq44 1/1 Running 1 1d
kube-system tiller-deploy-737598192-46l1m 1/1 Running 2 1d
onap-aai aai-resources-3302599602-c894z 1/1 Running 0 41m
onap-aai aai-service-3321436576-790w2 1/1 Running 0 41m
onap-aai aai-traversal-2747464563-pb8ns 1/1 Running 0 41m
onap-aai data-router-1397019010-fwqmz 1/1 Running 0 41m
onap-aai elasticsearch-2660384851-chf2n 1/1 Running 0 41m
onap-aai gremlin-1786175088-smqgx 1/1 Running 0 41m
onap-aai hbase-3880914143-9cksj 1/1 Running 0 41m
onap-aai model-loader-service-226363973-nlcnm 1/1 Running 0 41m
onap-aai search-data-service-1212351515-5wkb2 1/1 Running 0 41m
onap-aai sparky-be-2088640323-xs1dg 1/1 Running 0 41m
onap-appc appc-1972362106-lx2t0 1/1 Running 0 41m
onap-appc appc-dbhost-4156477017-9vbf9 1/1 Running 0 41m
onap-appc appc-dgbuilder-2616852186-htwkl 1/1 Running 0 41m
onap-dcae cdap0-4078069992-ql1fk 1/1 Running 0 41m
onap-dcae cdap1-4039904165-r8f2v 1/1 Running 0 41m
onap-dcae cdap2-422364317-827g3 1/1 Running 0 41m
onap-dcae dcae-collector-common-event-1149898616-1f8vt 1/1 Running 0 41m
onap-dcae dcae-collector-dmaapbc-1520987080-9drlt 1/1 Running 0 41m
onap-dcae dcae-controller-2121147148-1kd7f 1/1 Running 0 41m
onap-dcae dcae-pgaas-2006588677-0wlf1 1/1 Running 0 41m
onap-dcae dmaap-1927146826-6wt83 1/1 Running 0 41m
onap-dcae kafka-2590900334-29qsk 1/1 Running 0 41m
onap-dcae zookeeper-2166102094-4jgw0 1/1 Running 0 41m
onap-message-router dmaap-3565545912-2f19k 1/1 Running 0 41m
onap-message-router global-kafka-3548877108-ns5v6 1/1 Running 0 41m
onap-message-router zookeeper-2697330950-9fbmf 1/1 Running 0 41m
onap-mso mariadb-2019543522-nqqbz 1/1 Running 0 41m
onap-mso mso-2505152907-pg17g 1/1 Running 0 41m
onap-policy brmsgw-3913376880-qznzv 1/1 Running 0 41m
onap-policy drools-873246297-twxtq 1/1 Running 0 41m
onap-policy mariadb-922099840-x5xsq 1/1 Running 0 41m
onap-policy nexus-2268491532-025jf 1/1 Running 0 41m
onap-policy pap-1694585402-hwkdk 1/1 Running 0 41m
onap-policy pdp-3638368335-l00br 1/1 Running 0 41m
onap-portal portalapps-3572242008-qr51z 1/1 Running 0 41m
onap-portal portaldb-2714869748-wxtvh 1/1 Running 0 41m
onap-portal portalwidgets-1728801515-33bm7 1/1 Running 0 41m
onap-portal vnc-portal-1920917086-0q786 1/1 Running 0 41m
onap-robot robot-1085296500-d3l2g 1/1 Running 0 41m
onap-sdc sdc-be-1839962017-16zc3 1/1 Running 0 41m
onap-sdc sdc-cs-428962321-z87js 1/1 Running 0 41m
onap-sdc sdc-es-227943957-5ssh3 1/1 Running 0 41m
onap-sdc sdc-fe-3467675014-qp7f5 1/1 Running 0 41m
onap-sdc sdc-kb-1998598941-6z0w2 1/1 Running 0 41m
onap-sdnc sdnc-250717546-476sv 1/1 Running 0 41m
onap-sdnc sdnc-dbhost-2348786256-wsf9z 1/1 Running 0 41m
onap-sdnc sdnc-dgbuilder-3446959187-lspd6 1/1 Running 0 41m
onap-sdnc sdnc-portal-4253352894-73mzq 1/1 Running 0 41m
onap-vid vid-mariadb-2940400992-twp1r 1/1 Running 0 41m
onap-vid vid-server-377438368-mkgpc 1/1 Running 0 41m
Cyril Nleng
HI,
I just went through instalaltion tutorial :
1 - I am wondering how Openstack impact ONAP operations ?
2 - when will dcae component be available on kubernetes ?
Thanks,
Michael O'Brien
DCAE is in OOM Kubernetes as of 20170913
onap-dcae cdap0-4078069992-ql1fk 1/1 Running 0 41m
onap-dcae cdap1-4039904165-r8f2v 1/1 Running 0 41m
onap-dcae cdap2-422364317-827g3 1/1 Running 0 41m
onap-dcae dcae-collector-common-event-1149898616-1f8vt 1/1 Running 0 41m
onap-dcae dcae-collector-dmaapbc-1520987080-9drlt 1/1 Running 0 41m
onap-dcae dcae-controller-2121147148-1kd7f 1/1 Running 0 41m
onap-dcae dcae-pgaas-2006588677-0wlf1 1/1 Running 0 41m
onap-dcae dmaap-1927146826-6wt83 1/1 Running 0 41m
onap-dcae kafka-2590900334-29qsk 1/1 Running 0 41m
onap-dcae zookeeper-2166102094-4jgw0 1/1 Running 0 41m
Cyril Nleng
Those changes are available in which branch ?
Kiran Kamineni
Is there any reason for using the 8880 port instead of the 8080 port when installing Rancher?
8880 port seems to be blocked in our environment and using 8080 was working fine. I hope I will not run into other issues because I am using 8080?
Borislav Glozman
Kiran Kamineni, You can use whatever port you prefer. It should cause no issues.
Mohamed Aly ould Oumar
Hi, I managed to install all ONAP components using Kubernates, they seem to be running and I can access the Portl and authenticate,
Problem:
I can not access the SDC, It always gives the error "Sorry, you are not authorized to view this page, contact ...the administrators".
I tried with all the available users (demo, cs0008, jh0003) but none of them is working.
Can I get few bits of help regarding this?
Thanks in advance.
Mohamed Aly, Aalto University.
Borislav Glozman
Please try accessing it from the VNC. (<your node IP>:30211).
Shane Daniel
I am having the same issue as Mohamed. I am accessing it via the VNC portal on port 30211
Mohamed Aly ould Oumar
Hi, thank you for your reply, I'm accessing it from the VNC node with the port 30211, it doesn't work though and gives the same error.
Any update on this issue??
Mike Elliott
First verify that your portal containers are running in K8s (including the vnc-portal). Make notice of the 2/2 and 1/1 Ready states. If a 0 is on the left of those numbers then the container is not fully running.
kubectl get pods --all-namespaces -o=wide
onap-portal portalapps-4168271938-gllr1 2/2 Running
onap-portal portaldb-2821262885-rs4qj 2/2 Running
onap-portal portalwidgets-1837229812-r8cn2 1/1 Running
onap-portal vnc-portal-2366268378-c71z9 1/1 Running
If the containers (pods) are in a good state, ensure your k8s host has a routable IP address and substitute it into the example URL below:
http://<ip address>:30211/vnc.html?autoconnect=1&autoscale=0&quality=3
Mohamed Aly ould Oumar
This is not our problem, thanx anyway.
Mandeep Singh
I am also facing the same issue.
From the wireshark logs, GET /sdc2/rest/version api is having some issues.
Pods seem to be running fine ::
onap1-portal portaldb-3931461499-x03wg 2/2 Running 0 1h
onap1-portal portalwidgets-3077832546-jz647 1/1 Running 0 1h
onap1-portal vnc-portal-3037811218-hj3wj 1/1 Running 0 1h
onap1-sdc sdc-be-3901137770-h7d65 2/2 Running 0 1h
onap1-sdc sdc-cs-372240393-kqlw7 1/1 Running 0 1h
onap1-sdc sdc-es-140478562-r1fx9 1/1 Running 0 1h
onap1-sdc sdc-fe-3405834798-pjvkh 2/2 Running 0 1h
onap1-sdc sdc-kb-3782380369-hzb6q 1/1 Running 0 1h
Mohamed Aly ould Oumar
Any update on this issue??
Samuel Robillard
Hi, is there a page available where we could find any sort of updated list/diagram of the dependencies between the different onap components? Also is there a breakdown of the memory requirements for the various oom components?
Mike Elliott
Hi Samuel,
No official documentation on the dependencies at this point. But a very good idea to add. I will look into doing this.
For now you can see the dependencies in each of the deployment descriptors like in the AAI traversal example (see below) that depends on aai-resource and hbase containers before it starts up. In OOM we make use of Kubernetes init-containers and readiness probes to implement the dependencies. This prevents the main container in the deployment descriptor from starting until its dependencies are "ready".
oom/kubernetes/aai/templates] vi aai-traversal-deployment.yaml
pod.beta.kubernetes.io/init-containers: '[
{
"args": [
"--container-name",
"hbase",
"--container-name",
"aai-resources"
],
"command": [
"/root/ready.py"
],
"env": [
{
"name": "NAMESPACE",
"valueFrom": {
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.namespace"
}
}
}
],
"image": "{{ .Values.image.readiness }}",
"imagePullPolicy": "{{ .Values.pullPolicy }}",
"name": "aai-traversal-readiness"
}
]'
Michael O'Brien
Samuel, To add to the dependency discussion by Mike - Ideally I would like to continue the deployment diagram below with the dependencies listed in the yamls he refers to
The diagram can be edited by anyone - I will take time this week and update it.
Overall Deployment Architecture#Version1.1.0/R1
/michael
Gopinath Taget
There seems to be an error in the VFC service definition template when creating all services on an ubuntu 16.04 with 64 GB RAM:
Creating namespace **********
namespace "onap-vfc" created
Creating registry secret **********
secret "onap-docker-registry-key" created
Creating deployments and services **********
Error: yaml: line 27: found unexpected end of stream
The command helm returned with error code 1
Michael O'Brien
Gopinath,
Hi, VFC is still a work in progress - the VFC team is working through issues with their containers. You don't currently need VFC for ONAP to function - you can comment it out of the oneclick/setenv.bash helm line (ideally we would leave out services that are still WIP).
thank you
/michael
Gopinath Taget
Thanks Michael O'Brien!
Gopinath Taget
Hi Michael,
Checking back to see if VFC container issues are resolved and I can continue with the full install including other components?
Thanks!
Gopinath
Rajesh Mangal
Hi,
I am trying to bring up ONAP using Kubernets. Can you tell please if I should pull only OOM release-1.0.0 or a pull from master branch should also be fine, to get the ONAP up & running and also to run demo on it.
Thanks!
Michael O'Brien
Rajesh, Hi, the latest master is 1.1/R1 - the wiki is now targeting 1.1 - I'll remove the 1.0 link. Be aware that ONAP in general is undergoing stabilization at this point.
/michael
Samuel Robillard
Hi,
I am getting the same error as a few people above when it comes to accessing SDC where it says I am not authorized to view this page, and it also gives me a 500 error. My initial impression is that this might be because I cannot reach the IP corresponding to the sdc.api.simpledemo.openecomp.org in the /etc/hosts file from my vnc container.
Could anybody confirm if this may cause an issue? And if so, which container/host/service IP should be paired with the sdc url?
Thanks,
Sam
Samuel Robillard
Actually, I believe the resolution is correct, as it maps to the sdc-fe service, and if I change the IP to any other service the sdc web page times out. Also, if I curl<sdc-url>:8080 I do get information back. I am still not sure what might be causing this issue. Currently I am trying to look through the sdc logs for hints, but no luck as of yet
Samuel Robillard
The request is failing on the sdc-fe side. I posted the outputs of a tcpdump from the sdc-fe container here https://pastebin.com/bA46vqUk
Michael O'Brien
There are general SDC issues - I'll look them up and paste them. We are also investigating issues with the sdc-be container
see
SDC-451 - Getting issue details... STATUS
and
INT-106 - Getting issue details... STATUS
Syed Atif Husain
is there a workaround for this issue of accessing SDC where it says I am not authorized to view this page?
kowsalya v
I am also facing same SDC issue duo to sdc-es is not ready.
sdc-es shows the below error in log.
7-10-11T17:49:50+05:30] INFO: HTTP Request Returned 404 Not Found: Object not found: chefzero://localhost:8889/environments/AUTO
10/11/2017 5:49:50 PM
10/11/2017 5:49:50 PM================================================================================
10/11/2017 5:49:50 PMError expanding the run_list:
10/11/2017 5:49:50 PM================================================================================
10/11/2017 5:49:50 PMUnexpected API Request Failure:
10/11/2017 5:49:50 PM-------------------------------
10/11/2017 5:49:50 PMObject not found: chefzero://localhost:8889/environments/AUTO
10/11/2017 5:49:50 PMPlatform:
10/11/2017 5:49:50 PM---------
10/11/2017 5:49:50 PMx86_64-linux
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] ERROR: Running exception handlers
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] ERROR: Exception handlers complete
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] FATAL: Stacktrace dumped to /root/chef-solo/cache/chef-stacktrace.out
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
10/11/2017 5:49:50 PM[2017-10-11T17:49:50+05:30] ERROR: 404 "Not Found"
10/11/2017 5:49:51 PM[2017-10-11T17:49:50+05:30] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: Started chef-zero at chefzero://localhost:8889 with repository at /root/chef-solo
10/11/2017 5:49:52 PM One version per cookbook
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: Forking chef instance to converge...
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: *** Chef 12.19.36 ***
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: Platform: x86_64-linux
10/11/2017 5:49:52 PM[2017-10-11T17:49:52+05:30] INFO: Chef-client pid: 927
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] INFO: Setting the run_list to ["role[elasticsearch]"] from CLI options
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] WARN: Run List override has been provided.
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] WARN: Original Run List: [role[elasticsearch]]
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] WARN: Overridden Run List: [recipe[sdc-elasticsearch::ES_6_create_kibana_dashboard_virtualization]]
10/11/2017 5:49:53 PM[2017-10-11T17:49:53+05:30] INFO: HTTP Request Returned 404 Not Found: Object not found: chefzero://localhost:8889/environments/AUTO
10/11/2017 5:49:53 PM
Michael O'Brien
R1 is still under RC0 fix mode as we prep for the release - pull yesterdays (13th)
Mandeeps
https://gerrit.onap.org/r/#/c/18803/
fixes
OOM-359 - Getting issue details... STATUS
SDC-451 - Getting issue details... STATUS
and some of
OOM-110 - Getting issue details... STATUS
actually those are for sdc-be, I see a chef error on sdc-es - but the pod starts up ok (need to verify the endpoints though) - also this pod is not slated for the elk filebeat sister container - it should
getting a chef exit on missing elk components in sdd-es - even though this one is not slated for the sister filebeat container - likely a reused script across all pods in sdc - will take a look
see original OOM-110 commit
https://gerrit.onap.org/r/#/c/15941/1
Likely we can ignore this one in sdc-es - need to check endpoints though - pod comes up ok - regardless of the failed cookbook.
root@obriensystemsu0:~/onap/oom/kubernetes/oneclick# kubectl logs -f -n onap-sdc sdc-es-2514443912-nt3r3
/michael
Michael O'Brien
todo add to devops
oot@obriensystemsu0:~/onap/oom/kubernetes/oneclick# kubectl logs -f -n onap-aai aai-traversal-3982333463-vb89g aai-traversalCloning into 'aai-config'...
[2017-10-14T10:50:36-05:00] INFO: Started chef-zero at chefzero://localhost:1 with repository at /var/chef/aai-config
One version per cookbook
environments at /var/chef/aai-data/environments
[2017-10-14T10:50:36-05:00] INFO: Forking chef instance to converge...
Starting Chef Client, version 13.4.24
[2017-10-14T10:50:36-05:00] INFO: *** Chef 13.4.24 ***
[2017-10-14T10:50:36-05:00] INFO: Platform: x86_64-linux
[2017-10-14T10:50:36-05:00] INFO: Chef-client pid: 43
[
Vijendra Rajput
Hi Michael,
I am trying to setup ONAP using Kubernetes. I am using rancher to setup Kubernetes cluster. i am having 5 machine with 16GB memory each. Configured kubernentes successfully. when i am running createAll.bash to setup ONAP application, some of the components are successfully configured and running but some of the components are failing and with "ImagePullOfBack" error.
when i am trying to pull images independently i am able to download images from nexus successfully but not when running through createAll script. When i went through the script seem everything fine and not able to understand what is wrong. could you please help me understand the issue.
~Vijendra
Michael O'Brien
Vijendra,
Hi, try running the docker pre pull script on all of your machines first. Also you may need to duplicate /dockerdata-nfs across all machines - manually or via a shared drive.
/michael
Samuel Robillard
Hi,
I started getting an error with the MSO when I redeployed yesterday
Starting Xvfb on display :88 with res 1280x1024x24
Executing robot tests at log level TRACE
==============================================================================
OpenECOMP ETE
==============================================================================
OpenECOMP ETE.Robot
==============================================================================
OpenECOMP ETE.Robot.Testsuites
==============================================================================
.
.
.
------------------------------------------------------------------------------
Basic SDNGC Health Check | PASS |
------------------------------------------------------------------------------
Basic A&AI Health Check | PASS |
------------------------------------------------------------------------------
Basic Policy Health Check | PASS |
------------------------------------------------------------------------------
Basic MSO Health Check | FAIL |
503 != 200
------------------------------------------------------------------------------
Basic ASDC Health Check | PASS |
------------------------------------------------------------------------------
Basic APPC Health Check | PASS |
------------------------------------------------------------------------------
Basic Portal Health Check | PASS |
------------------------------------------------------------------------------
Basic Message Router Health Check | PASS |
------------------------------------------------------------------------------
Basic VID Health Check | PASS |
------------------------------------------------------------------------------
Basic Microservice Bus Health Check | FAIL |
Variable '${MSB_ENDPOINT}' not found. Did you mean:
${MSO_ENDPOINT}
${MR_ENDPOINT}
------------------------------------------------------------------------------
OpenECOMP ETE.Robot.Testsuites.Health-Check :: Testing ecomp compo... | FAIL |
11 critical tests, 8 passed, 3 failed
11 tests total, 8 passed, 3 failed
==============================================================================
OpenECOMP ETE.Robot.Testsuites | FAIL |
11 critical tests, 8 passed, 3 failed
11 tests total, 8 passed, 3 failed
==============================================================================
OpenECOMP ETE.Robot | FAIL |
11 critical tests, 8 passed, 3 failed
11 tests total, 8 passed, 3 failed
==============================================================================
OpenECOMP ETE | FAIL |
11 critical tests, 8 passed, 3 failed
11 tests total, 8 passed, 3 failed
==============================================================================
Output: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_11572/output.xml
Log: /var/opt/OpenECOMP_ETE/html/logs/ete/ETE_11572/log.html
Anybody else get this error/may know how to determine the root cause of this?
Michael O'Brien
Yes, we have been getting this since last friday - I have been too busy to raise an issue like normal - this is not as simple as onap-parameters.xml it looks like a robot change related to the SO rename - will post a JIRA/workaround shortly. Anyway SO is not fully up on OOM/Heat anyway currently.
20171019 - see the same thing on rackspace today
Also - nice dependency diagram you started.
/michael
Edmund Haselwanter
same here. health check is failing. seeing this is OOM as well as heat_openstack. SO-246 - Getting issue details... STATUS
Radhika Kaslikar
Hi ,
I have brought up ONAP using OOM master branch which I have pulled yesterday.But on running health check I am facing similar issues as discussed above where MSO fails with 503 error, and I also see portal failing with 404 error.
Can you please let us know if there is any workaround for this issue or is there any build where the necessary components for running vFW/vDNS demos like portal,SDC,AAI,SO,VID,SDNC,Policy and DCAE are healthy.
Thanks,
Radhika
Michael O'Brien
MSO, APPC, SDNC, Policy regularly pass/fail on a daily basis - as we are in branch stabilization mode for R1 - join the triage party below
INT-106 - Getting issue details... STATUS
/michael
Edmund Haselwanter
Michael O'Brien
of course, but in the spirit of "open" source - everything has access - hence 777 everywhere - until production deployments that is!
Edmund Haselwanter
how do I set/correct the missing values in the health check? How do I know if everything should be working with a current deployment?
Rahul Sharma
For the MSO Basic HealthCheck failure, see if the last comment in this JIRA helps: https://jira.onap.org/browse/SO-208?focusedCommentId=15724&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15724
Michael O'Brien
MSO passed for 24 hours last tue - it was a good day! I predict more of these next week
stay tuned to channel 106 - INT-106 - Getting issue details... STATUS
Radhika Kaslikar
Hi ,
On running health check, MSO is still failing
Basic MSO Health Check | FAIL |
503 != 200
And on checking the MSO container logs I see the following error :
2017-11-06 19:14:21,766||ServerService Thread Pool -- 75|contextInitialized||||ERROR|AvailabilityError|| MSO-RA-5210E Configuration error:Unknown. MSO Properties failed to initialize completely
2017-11-06 19:14:21,786||ServerService Thread Pool -- 75|contextInitialized||||ERROR|AvailabilityError|| MSO-GENERAL-9400E Exception: java.lang.NullPointerException - at org.openecomp.mso.openstack.utils.CloudConfigInitializer.contextInitialized(CloudConfigInitializer.java:65) -
Can anyone please tell me how can I solve this.
Before running the health check all the pods were in running state.
Beili Zhou
Michael O'Brien:
In the Delete/Rerunconfig-initcontainerfor/dockerdata-nfsrefresh section, the steps for delete fs is as the following
This would not be good for the case of cluster configure while the directory `dockerdata-nfs` is mounted as per suggestion from ClusterConfiguration(optional-donotuseifyourserver/clientareco-located):
The the description in OOM-257 (DevOps: OOM config reset procedure for new /dockerdata-nfs content) is more friendly , where the step is described as
Michael O'Brien
A persistent NFS mount is recommended in the official docs - this is a collaborative wiki - as in join the party of overly enthusiastic developers - in my case I run on AWS EBS so not an issue - you are welcome to help document the ecosystem.
The sky at OOM is a very nice shade of blue!
Sorry I am super excited about the upcoming developer conference on 11 Dec.
/michael
ramki krishnan
Hi Michael,
In my setup, I am able to start the ONAP components only if all the images already are downloaded using prepull_docker.sh. So far, I have been able to start all aai components using "createAll.bash -n onap -a aai" after the images have been downloaded using prepull_docker.sh.
Here are the challenges I am facing
onap-clamp clamp-2925721051-g814q 0/1 CrashLoopBackOff 144 12h
onap-consul consul-agent-3312409084-lwdvv 0/1 CrashLoopBackOff 162 13h
onap-consul consul-server-1173049560-mk40v 0/1 CrashLoopBackOff 163 13h
onap-consul consul-server-1173049560-pjpm5 0/1 CrashLoopBackOff 163 13h
onap-consul consul-server-1173049560-rf257 0/1 CrashLoopBackOff 163 13h
onap-vfc vfc-workflow-2530549902-19tw0 0/1 CrashLoopBackOff 166 13h
Your suggestions on next steps are much appreciated.
Thanks,
Ramki
Beili Zhou
@ramki krishnan
You can use the following command to check out the logs of why the pod is failed with `CrashLoopBackOff`:
In your case, the command would be:
ramki krishnan
Thanks Beili. Below is the error I get for clamp. Looks like clamp is expecting some configuration, specifically password. Any clues on the specific configuration which needs to be updated?
***************************
APPLICATION FAILED TO START
***************************
Description:
Binding to target org.onap.clamp.clds.config.EncodedPasswordBasicDataSource@53ec2968 failed:
Property: spring.datasource.camunda.password
Value: strong_pitchou
Reason: Property 'password' threw exception; nested exception is java.lang.NumberFormatException: For input string: "st"
Action:
Update your application's configuration
Michael O'Brien
Use the recommended subset (essentially ONAP 1.0 components from the original seed code in Feb 2017 - these work with the vFirewall use case - until we stabilize the R1 release.
Clamp, aaf, and vfc are currently still being developed - there are usually 2 to pod failures in these components - I will post the JIRAs. - these are known issues and being worked on in the OOM JIRA board.
https://jira.onap.org/secure/RapidBoard.jspa?rapidView=41&view=planning&selectedIssue=OOM-150
OOM-333 - Getting issue details... STATUS
OOM-324 - Getting issue details... STATUS
OOM-408 - Getting issue details... STATUS
You don't need these 3 components to run the vFirewall - for now I would exclude them in HELM_APPS in setenv.bash - later when they are stable you can add them back.
ramki krishnan
Many thanks Michael.
Alex Lee
Hi, @Michael O'Brien. As we can see in https://git.onap.org/integration/tree/version-manifest/src/main/resources/docker-manifest.csv,
all the tag of docker images is changing to R1 release. But now, the images for OOM/master is still with the tag: 1.1-STAGING-latest
Michael O'Brien
Yes, been thinking about this for some time - and I have seen issues where we don't pick up problems we should have with for example the openecomp to onap refactor earlier this week - As you know from the TSC meeting yesterday - the manifest is still in flux in the move to the dockerhub versions
OOM-432 - Getting issue details... STATUS
OOM-438 - Getting issue details... STATUS
Alex, can't tell your company from your email - you are welcome in the Wed 10EDT OOM meeting where we can farm out work items like this.
thank you
/michael
Alex Lee
Thanks for your explanations Michael O'Brien.
Another question, when docker images for the Amsterdam release is ready, the docker repo for ONAP is still nexus3 at onap?
Because in
OOM-438 - Move oomk8s docker images to new onap repo on dockerhub
you are moving some images to the new repo called onap.
Michael O'Brien
I am not sure yet - but I would expect that master continues to pull from nexus/nexus3, and the R1 branch pulls from dockerhub - but need to verify - put a watch on the JIRA - I usually update them with critical info/links/status
/michael
Alex Lee
ok. thanks a lot, michael
Michael O'Brien
Stay with helm v2.3 - do not upgrade to 2.6 or vnc-portal will fail - see OOM-441 - Getting issue details... STATUS
Syed Atif Husain
I have successfully start onap on kubernetes with below apps in setenv.sh. All pods show 1/1 running, but when I login to portal I only SDC. Why are the other modules not appearing in portal?
HELM_APPS=('consul' 'msb' 'mso' 'message-router' 'sdnc' 'vid' 'robot' 'portal' 'policy' 'appc' 'aai' 'sdc' 'log' 'cli' 'multicloud' 'clamp' 'vnfsdk' 'uui' 'aaf' 'vfc' 'kube
2msb')
Rahul Sharma
Syed Atif Husain: Are you logged on as demo user?
Syed Atif Husain
Rahul Sharma I tried cs0008, the catalog designer role
Rahul Sharma
Syed Atif Husain: That would only show SDC. Try using demo/demo123456!
Syed Atif Husain
Thanks Rahul Sharma. I have encountered another issue, SDC keeps giving me 500 error saying you are authorized to view this page, when I login as cs0008. I see in comments above that this is a known issue. Is there a workaround for this or can I pull older/stable code to avoid this?
tuan nguyen
This is a great accomplishment for us to start playing with- thanks a lot Amar and Prakash for your effort putting things together. One thing I mentioned earlier in the call, we probably need to review and upgrade not using Docker 1.12 (2 years old) where Docker now moving away to 1.13 last year now Docker CE (public) and Docker EE (Enterprise) where number starting with Docker 1.17.x (2017=1.17, 2018, 1.18). Also Rancher is not mandatory just to build Kubernetes only as I met several customers using in production where we can build Kubernetes 1.6, 1.7 or 1.8 quite easy now using Kubeadm in few minutes (skipping Rancher). I meant Rancher is good for other usecases where customers need multi orchestrator environment (K8s, Mesos, Swarm). I don't see real value for Rancher to be here in our ONAP document where it might be confusing people that Rancher is mandatory just for bringing up K8s. Another thing, I was attending last Docker conference, Kubernetes will soon support Containerd in which CLI command to be running will be "crictl" not "kubectl" anymore, allowing Kubernetes to be working directly with Containerd, thus improving performance for Kubernetes where ONAP will be fully taking benefif of (GA will be end of 2017). We probably need to closely follow what Kubernetes community is heading to so accordingly update our documentation. Kind of difficult to update our documentation every month but keep up with Kubernetes is a good way to catch in my opinion...
Michael O'Brien
Good discussion,
I agree - we will move from docker 1.12 when we move from Rancher 1.6.10 to Rancher 2.0 - where we can use 1.17.x - but it is a Rancher + Docker + Kubernetes config issue.
Rancher is not required - we tried minikube, there are also commercial CaaS frameworks - however Rancher is the simplest and fastest approach at the moment.
You are welcome to join the OOM call at 10AM EDT on Wed - we usually go through the JIRA board - and the Kubeadm work sounds like a good Epic to work on. We are very interested in various environments and alternatives for running our pods - please join.
There is also a daily OOM blitz on stabilizing the branch and deploying the vFirewall use case that you are welcome to attend
1200EDT noon until either the 4th Dec KubeCon or the 11 dec ONAP developer conference.
https://lists.onap.org/pipermail/onap-discuss/2017-November/006483.html
I updated the page to state that Rancher is just "one" way to get your pods up - add any subpages for other types of frameworks as you wish.
/michael
tuan nguyen
great job Michael- hope we can have more and more from people trying ONAP and giving more feedback from people too and great contributors like you!
Sen Shu
Hi all.
I have a question.
In the page of installation using HEAT,
v CPU needs 148, but this page discribes
64 v CPU needed.
why these has differences so much.
are there differences of items that can be installed?
best regards
sen
Michael O'Brien
Sen,
Good question, as you know CPU can be over-provisioned - threads will just queue more, unlike RAM and HD which cannot be shared. 64 vCPUs is a recommended # of vCPUs based on bringing up the system on 64 and 128 core systems on AWS - we top out at 44 cores during startup (without DCAE - so this may be multiplied by 3/2 in that case as DCAE has 1/3 the containers in ONAP). Therefore for non-staging/non-production systems you will not gain anything having more that 44 vCores until we start hammering the system with real world VNF traffic. The HEAT provisioning is a result of the fact that the docker allocation model is across multiple silo VMs and not flat like in Kubernetes currently. Therefore some servers may only use 1/8 where others may peak at 7/8. It all depends on how you use onap.
You can get away during development with 8 vCores - ONAP will startup in 11m instead of 7 on 32 vCores.
Since DCAE is not currently in Kubernetes in R1 - then you need to account for it only in openstack.
Depending on the VNF use case you don't need the whole system yet, for example the vFW only needs 1.0.0. era components, where vVolte and vCPE will need new R1 components - see the HELM_APPS recommendation in this wiki.
Similar ONAP HEAT deployment (without DCAE or the OPEN-O VM - triple the size in that case) - this will run the vFirewall but not to closed-loop.
/michael
Sen Shu
michael,
thank you for your answering my question.
It's make me easier to understand.
I'll use HEAT installation and allocate tempolarily 148 v CPU because of need to use DCAE.
I'll also see the page you referenced.
thanks
Sen
Joey Sullivan
I was getting the following error when running "./createConfig.sh -n onap"
There was something wrong with helm tiller rbac config.
I found the solution here
https://github.com/kubernetes/helm/issues/3130
https://docs.bitnami.com/kubernetes/how-to/configure-rbac-in-your-kubernetes-cluster/
This is what I did to fix the issues in my deployment.
Vaibhav Chopra
1) Whenever you delete the configuration with
# helm delete --purge onap-config
release
"onap-config"
deleted
It deletes the config pod, You do need to delete the namespace as well for complete cleanup:-
kubectl delete namespace onap
2) Another observation is with Kubectl version :-
Currently below command is installing the latest version 1.8.4
curl -LO https:
//storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
To download a specific version, replace the
$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)
portion of the command with the specific version.I think , the difference in both version is about the init container, due to which in v1.8.3, it waits for the dependent container to come up due to which some time the dependent container gets timed out for me like vnc-portal.
such as drools checking for brmsgw to become up:-
2017-11-27 08:16:46,757 - INFO - brmsgw is not ready.
2017-11-27 08:16:51,759 - INFO - Checking if brmsgw is ready
2017-11-27 08:16:51,826 - INFO - brmsgw is not ready.
2017-11-27 08:16:56,831 - INFO - Checking if brmsgw is ready
2017-11-27 08:16:56,877 - INFO - brmsgw is ready!
Michael O'Brien
Your 1.8.4 vs 1.8.3 version observation is good - we have issues with vnc-portal under the latest 1.8.8 - will look more into this - thank you
see OOM-441 - Getting issue details... STATUS if you would like to comment
/michael
ATUL ANGRISH
HI Michael,
I am trying to configure and deploy ONAP components using kubernetes but after doing this when i run below mentioned command to check the pods status,
kubectl get pods --all-namespaces
There is a problem in SDC,VNC component. They are not going to be in up state.
onap-sdc sdc-be-754421819-b696x 1/2 ErrImagePull 0 1h
onap-sdc sdc-fe-902103934-qmf3g 0/2 Init:0/1 4 1h
onap-portal vnc-portal-3680188324-kjszk 0/1 Init:2/5 3 1h
I have used 1.12 docker version along with 1.6.10 Rancher and 2.3 Helm.
I guess something changes in the chef scripts. I dont know the reason.
When i describe the pod of SDC-be I am getting this error
Normal Started 24m kubelet, k8s-2 Started container
Normal Pulling 21m kubelet, k8s-2 pulling image "docker.elastic.co/beats/filebeat:5.5.0"
Warning Failed 21m kubelet, k8s-2 Failed to pull image "nexus3.onap.org:10001/openecomp/sdc-backend:1.1-STAGING-latest": rpc error: code = 2 desc = net/http: request canceled
Could you please help me on that.
ATUL ANGRISH
HI
We are facing an issue while deploying pods mainly SDC using Kubernetes.
root@k8s-2:/# kubectl get pods --all-namespaces -a
onap-aai aai-resources-898583818-6ptc4 2/2 Running 0 1h
onap-aai aai-service-749944520-0jhxf 1/1 Running 0 1h
onap-mso mariadb-829081257-vx3n1 1/1 Running 0 1h
onap-mso mso-821928192-qp6tn 2/2 Running 0 1h
onap-sdc sdc-be-754421819-phch8 0/2 PodInitializing 0 1h
onap-sdc sdc-cs-2937804434-qn1q6 1/1 Running 0 1h
onap-sdc sdc-es-2514443912-c7fmd 1/1 Running 0 1h
onap-sdc sdc-fe-902103934-rlbhv 0/2 Init:0/1 8 1h
When we see the logs of this container we can see that there are issues.
Please find below the steps to check the logs:-
1) Run kubectl command to check the pods status.
kubectl get pods --all-namespaces –a
onap-mso mso-821928192-qp6tn 2/2 Running 0 1h
onap-sdc sdc-be-754421819-phch8 0/2 PodInitializing 0 1h
onap-sdc sdc-cs-2937804434-qn1q6 1/1 Running 0 1h
onap-sdc sdc-es-2514443912-c7fmd 1/1 Running 0 1h
2) Using docker ps –a command to list the containers.
root@k8s-2:/# docker ps -a | grep sdc-be
347b4da64d9c nexus3.onap.org:10001/openecomp/sdc-backend@sha256:d4007e41988fd0bd451b8400144b27c60b4ba0a2e54fca1a02356d8b5ec3ac0d "/root/startup.sh" 53 minutes ago Up 53 minutes k8s_sdc-be_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_0
2b4cf42b163a oomk8s/readiness-check@sha256:ab8a4a13e39535d67f110a618312bb2971b9a291c99392ef91415743b6a25ecb "/root/ready.py --con" 57 minutes ago Exited (0) 53 minutes ago k8s_sdc-dmaap-readiness_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_3
a066ef35890b oomk8s/readiness-check@sha256:ab8a4a13e39535d67f110a618312bb2971b9a291c99392ef91415743b6a25ecb "/root/ready.py --con" About an hour ago Exited (0) About an hour ago k8s_sdc-be-readiness_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df1f_0
1fdc79e399fd gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD_sdc-be-754421819-phch8_onap-sdc_d7e74e36-da76-11e7-a79e-02ffdf18df
3) Use this command to see the docker logs
Docker logs 347b4da64d9c | grep err/exceptions
4) Observe the error logs and exceptions.
Currently we are getting below mentioned exceptions:
Recipe Compile Error in /root/chef-solo/cache/cookbooks/sdc-catalog-be/recipes/BE_2_setup_configuration
2017-12-06T11:53:48+00:00] ERROR: bash[upgrade-normatives] (sdc-normatives::upgrade_Normatives line 7) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.openecomp.sdcrests.health.rest.services.HealthCheckImpl]: Constructor threw exception; nested exception is java.lang.ExceptionInInitializerError
We are following below mentioned link for configuration.
https://wiki.onap.org/display/DW/ONAP+on+Kubernetes
We did the cleanup and reinstall multiple times but got the same issue again and again.
Regards
Atul
Brian Freeman
Installing on Azure - other than the network security groups via portal.azure.com screenshots seemed to go okay up to running cd.sh.
You need to number the steps since sometimes its not obvious when you are switching to a new task vs describing some future or optional part. Had to be careful to not blindly copy/paste since you have multiple versions in the steps some with notes like "
# below
20171119
- still verifying -
do
not use" which was confusing. The video has the steps which is good but its tedious to start/stop the video and then look at the next step in the wiki. I will update when it completes.
Do we need to add port 10250 to the security groups ? I got error messages on cd.sh (but admittedly I didnt watch that part of the video)
Brian Freeman
It didnt come up cleanly but perhaps I didnt wait long enough for something.
I did notice that the SDNC containers for dmaaplistener and ueblistener didnt get loaded and SDNC stayed in Init
root@ONAP-OOM1:~# kubectl get pods --all-namespaces | grep sdnc
onap-sdnc sdnc-1395102659-z7r64 0/2 Init:0/1 0 23m
onap-sdnc sdnc-dbhost-3029711096-2s7mg 0/1 ContainerCreating 0 23m
onap-sdnc sdnc-dgbuilder-4267203648-pf5xk 0/1 Init:0/1 0 23m
onap-sdnc sdnc-portal-2558294154-xn94v 0/1 Init:0/1 0 23m
onap-vfc vfc-ztesdncdriver-1452986549-rjj18 1/1 Running 0 23m
Michael O'Brien
Recorded the latest video from clean EC2 VM - install, run cd.sh - 24m to 87 pods up - running healthcheck now - willl post video in 1 hour
Yes, some images may not be in the prepull that is currently hardcoded - so a couple pods take a while
in general vnc-portal and then aai-service are the last to come up
/michael
Brian Freeman
Azure VMs seem to only have a 30GB OS disk. I can add a data disk but I think I should run the install from someplace other than root. Is that simple to change in cd.sh ?
Brian Freeman
Was able to complete bringing up ONAP on Azure through health check except for dcae
A few things missing:
Had to add a data disk to the Ubuntu VM.
Michael O'Brien
Yes forgot to point out the requirements on this page - you will need 70G to install ONAP and it will run up to 90 over a week (mostly logs)
Curious about Azure filesystems - I had issues with non-EBS in the past - will recheck.
Will raise a JIRA on the HEAT to OOM sync for SDNC
OOM-491 - Getting issue details... STATUS
for SDC - would raise a JIRA but I don't see the sanity container in HEAT - I see the same 5 containers in both
Brian Freeman
check SB01 in Windriver there is a sanity-check container that runs for their self-health check. I think its only needed for trouble shooting
You can see it in nexus
Brian Freeman
You did point out the disk size requirements in the video. The issue is really that AWS makes that a setting at VM create and Azure you have to separately create the data disk (or at least I couldn't find a way to do it on the original create via the portal)
Michael O'Brien
yes I see the docker image - just wondering where the docker container is in the SDC vm - my stack is 2 days old
I only see the front/back end ones, cassandra, elasticsearch and kibana - 5
let us know and I will raise a JIRA for SDC like we did for SDNC
wait -when I get back -'ll check the compose file - perhaps it is optional - ok I see it in /data/scripts/docker_run.sh
docker run --detach --name sdc-sanity --env HOST_IP=${IP} --env ENVNAME="${DEP_ENV}" --env http_proxy=${http_proxy} --env https_proxy=${https_proxy} --env no_proxy=${no_proxy} --log-driver=json-file --log-opt max-size=100m --l
it is optional
docker run sdc-sanity
if [ ${RUNTESTS} = true ]; then
but we should run it
raising JIRA to add this optional container
OOM-492 - Getting issue details... STATUS
thanks
/michael
Michael O'Brien
BTW, thanks Brian for the review - when I started I brought up HEAT in May 2017 and enumerated all the containers to get a feel - we should have done another pass on all the vms - but without someone who would know the optional ones like in SDC we would have missed the sdc-sanity one - thanks
/michael
Michael O'Brien
You can run the scripts from anywhere - I usually run as ubuntu not root - the reason the rancher script is root is because you would need to log out back in to pick up the docker user config for ubuntu.
I run either directly in /home/ubunutu or /root
The cloned directory will put oom in either of these
For ports - yes try to open everything - on AWS I run with an all open CIDR security group for ease of access - on Rackspace the VM would need individual port opennings
/michael
Michael O'Brien
Yes, the multiple steps are confusing - trying to help out a 2nd team that is working using Helm 2.7 to use the tpl function - I'll remove those until they are stable
thanks
/michael
Michael O'Brien
Updated wiki - thought I removed all helm 2.6/2.7 - i was keeping the instructions on aligning the server and client until we fix the vnc-portal issue under helm 2.6 - this wiki gets modified a lot as we move through all the rancher/helm/kubernetes/docker version
Michael Phillip
Hi, I'm new to ONAP and cloud computing in general, but trying to work through the above guide. I'm at the point where I'm waiting for the onap pods to come up. Most have come up, but some seem to be stuck after 2 hrs. I'm wondering if perhaps I have insufficient memory available. I'm installing on a KVM VM with 16 vCPU, 55G RAM and 220G HD.
One thought is to shutdown the VM, increase RAM to about 60G and restart, but I'm uncertain as to the pontential implications. Any suggestions as to how I could proceed would be greatly appreciated.
Thanks,
Michael
James MacNider
Hi Michael Phillip,
Unless you've taken the step to remove some components from the HELM_APPS variable in the setenv.bash script (after the oom repository was cloned), you very likely require 64 GB of RAM.
I've successfully deployed a subset of the components in a 48GB RAM VM with HELM_APPS set to this:
HELM_APPS=('mso' 'message-router' 'sdnc' 'vid' 'robot' 'portal' 'policy' 'appc' 'aai' 'sdc' 'log')
Michael Phillip
Thanks alot James. I have 72G on my host, but would like to leave room for additional VM's, like vFirewall. So I'll try removing some components as you suggested. Will give me an opportunity to try the clean up
Thanks again,
Michael
ATUL ANGRISH
Hi Michael,
We tried to up sdc pod in my setup but we are not able to make it up.
onap-sdc sdc-be-754421819-phch8 0/2 PodInitializing 0 1h
onap-sdc sdc-cs-2937804434-qn1q6 1/1 Running 0 1h
onap-sdc sdc-es-2514443912-c7fmd 1/1 Running 0 1h
onap-sdc sdc-fe-902103934-rlbhv 0/2 Init:0/1 8 1h
I think there is something changed in prepull_docker script.
We tried to prepull_docker script using
# from OOM-
328
- pulls in sequence
# For branch
"release-1.1.0"
:
curl https:
//jira.onap.org/secure/attachment/10741/prepull_docker_110.sh > prepull_docker.sh
Anyone who will try to install/deploy ONAP SDC container , will get an issue in SDC pod come up issue.
Exceptions:-
Recipe Compile Error in /root/chef-solo/cache/cookbooks/sdc-catalog-be/recipes/BE_2_setup_configuration
2017-12-06T11:53:48+00:00] ERROR: bash[upgrade-normatives] (sdc-normatives::upgrade_Normatives line 7) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.openecomp.sdcrests.health.rest.services.HealthCheckImpl]: Constructor threw exception; nested exception is java.lang.ExceptionInInitializerError
Regards
Atul
Michael O'Brien
Correct, looks like a standard spring bean startup error -specific to SDC -which should also be failing in the HEAT deployment - I tested last night release-1.1.0 to test a merge in oom and all my pods are up except the known aaf - also the CD job is OK
http://jenkins.onap.info/job/oom-cd/621/console
Build #621 (7-Dec-2017 2:00:00 PM)
this bothers me though - as I hope we are not missing something that only yourself sees - will look more into it - you are using 1.1.0 or master (master may have issues)
see parallel discussion last night
https://lists.onap.org/pipermail/onap-discuss/2017-December/006800.html
Also are you bringing up anything - as if you check the yaml there are dependencies
In your onap-discuss post last night - you did not have the dependent pods up - did this fix the issue - I quickly looked at the code and the HealhCheckImpl class is doing healthchecks - which would fail I would expect on dependent pods not up
thank you
/michael
Brian Freeman
testsuite (robot) is an older version 1.1-STAGING-latest. How would I upgrade just testsuite to 1.2-STAGING:latest ?
It only loads Demonstration customer not SDN-ETHERNET-INTERNET needed for vCPE
Alexis de Talhouët
Easiest way is to go the the Kubernetes UI, then under the onap-robot namespace, click on the Deployments tab, then click the three dots next to the deployment to update (in this case, robot), it will pop up a window where you can edit, among everything deployment parameters, the image version. Then click update. This will bounce the deployment (hence the pod), and will create a new deployment with the changes.
Brian Freeman
SDNC org.ops4j.pax.logging.cfg isnt the same as the file in gerrit. I noticed there is a different file in dockerdata-nfs/onap/log/sdnc that appears to come from the OOM repo instead of the CCSDK repo (same OOM file looks to be used for appc). Why isnt the SDNC logging configuration being used ?
Alexis de Talhouët
What you're mentioning, Brian, is the major issue we currently have in OOM:
we need to fork projects' config in order to adjust to kubernetes context, whether it's for address resolution, or for logging. I'll let Michael O'Brien explained what was done for the logs. But the overall purpose wrt logging is to centralized them and have them browsable through a Kibana interface (using logstash).
Regarding the address resolution, well, kubernetes provide it's own way of resolving services within namespaces, <service>.<namespace>:<internal-port>. Because of this, everywhere in the config where there is some network config we change it to levrage k8s networking.
Michael O'Brien
Brian, yes there is a centralized logging configuration that has the RI in the logging-analytics repo - this ELK stack available on the onap-log kibana container internal port 5601 uses a filebeat container (all the 2/2 pods) to pipe the logs in through a set of PV's using the emptyDir directive in the yaml. A logging spec is being worked out.
Logging User Guide#Quickstart-gettingyourELKDashboardup
I'll update this more shortly.
Brian Freeman
Well the logging team needs to find a solution for the heavy user of the local logs where we turn on DEBUG/TRACE and generate huge amount of log entries while we step through the DG processing. The SDNC logging.cfg also creates the per DG files of data. I guess I can simply replace the file in dockerdata-nfs with the version I can use for support but it seems like we need a better solution that can fit both needs. Can't the logging.cfg support both the common onap logs and the SDNC specific DEBUG logging in the /opt/opendaylight/current/data/log directory ?
ATUL ANGRISH
HI Michael
I am using release 1.1.0. It was working till Monday 4th Dec and then after that we clean up everything and redeploy the pods again to test something in my environment.
The after that SDC-be and SDC-fe never comes up. We tried this on 2-3 more setups but problem still persist.
I suspect that there is a problem in prepull_docker.sh script is not able to pull images which we currently required for SDC.
/ATUL/oom/kubernetes/sdc/values.yaml
sdcBackend: nexus3.onap.org:10001/openecomp/sdc-backend:1.1-STAGING-latest
sdcFrontend: nexus3.onap.org:10001/openecomp/sdc-frontend:1.1-STAGING-latest
As you can see all my nodes are up except SDC-be and SDC-fe
root@k8s-2:/# kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-km9jg 1/1 Running 8 2h
kube-system kube-dns-638003847-z8gnh 3/3 Running 23 2h
kube-system kubernetes-dashboard-716739405-xn4hx 1/1 Running 7 2h
kube-system monitoring-grafana-2360823841-fsznx 1/1 Running 7 2h
kube-system monitoring-influxdb-2323019309-qks0t 1/1 Running 7 2h
kube-system tiller-deploy-737598192-wlhmk 1/1 Running 7 2h
onap config 0/1 Completed 0 1h
onap-aai aai-resources-898583818-6ptc4 2/2 Running 0 1h
onap-aai aai-service-749944520-0jhxf 1/1 Running 0 1h
onap-mso mariadb-829081257-vx3n1 1/1 Running 0 1h
onap-mso mso-821928192-qp6tn 2/2 Running 0 1h
onap-sdc sdc-be-754421819-phch8 0/2 PodInitializing 0 1h
onap-sdc sdc-cs-2937804434-qn1q6 1/1 Running 0 1h
onap-sdc sdc-es-2514443912-c7fmd 1/1 Running 0 1h
onap-sdc sdc-fe-902103934-rlbhv 0/2 Init:0/1 8 1h
onap-sdc sdc-kb-281446026-tvg8r 1/1 Running 0 1h
Thanks
Atul
Michael O'Brien
Atul,
Hi you saw my previous comment on the dependent pods for SDC - do you have those up
http://jenkins.onap.info/job/oom-cd/
I am bringing up a clean release-1.1.0 environment to record an SDC video for another issue - so I will verify this again.
Anyway the healthcheck on the CD server is OK - the only difference is that the images are cached there right now - so on the off chance that the images were removed or not available via nexus3 - this will be seen on a clean EC2 server shortly. ( a real CD server that brings up a clean VM every time is in the works)
/michael
Michael O'Brien
In master (I am also testing a patch) - I get the following (ignore aaf) in master
could be an image issue (different images in 1.1.0 and master) - or a config issue that has not been cherry picked to master yet (we are running the reverse), note portal depends on sdc - sdc is the issue
Make sure you use release-1.1.0 - as this is our stable branch right now
Michael O'Brien
Atul,
See separate mail on onap-discuss - we are stabilizing master - doing the last of Alexis de Talhouët cherry picks from stable release-1.1.0 - then SDC and AAI should come up
I recommend running a full set of pods in release-1.1.0 for now - you can also assist in testing master once the merges are in so we can declare it open for pending feature commits
thank you
/michael
Michael O'Brien
Atul hi, thanks for the effort helping us stablilize - Alexis de Talhouët and the AAI team have fixed the 2 aai-service and aai-traversal issue that popup up 10am friday on release-1.1.0 - you can use that branch again.
OOM-501 - Getting issue details... STATUS
/michael
ATUL ANGRISH
Hi Michael,
Are you going to clean and rebuild release 1.1.0 for prepull_docker images?
Is there any alternative to proceed ?
I have again tried release 1.1.0 today in order to up my all ONAP components especially (AAI and SDC as well).But i am facing the same issue. My SDC component is not going to be up
Regards
Atul
Michael O'Brien
Atul, hi,
There is no issue with the prepull - it is just a script that greps the docker image tags for all values.yaml - v1.1.0 in most cases.
If you run cd.sh at the top of the page - it will clean your environment and upgrade it - or checkout the commands it you want to do it yourself. There is no issue with the release-1.1.0 branch (besides a single not-required aaf container) - the
release-1.1.0 is stable as of 20171208:2300 EDT
As a check can you cover off each of the steps if you don't use the automated deploy script
(delete all pods, delete your config pod, remove dockerdata-nfs, source setenv.sh (make sure your onap-parameters.yaml is ok), create config, wait for it, (prepull is optional - it just speeds things up) , create pods, run healthcheck, PUT cloud-region to AAI ...
Remember we have not had an answer yet on your config - sdc will not come up unless dependent pods are up - for example - just try to run everything to start - then fine tune a subtree of pods later.
please try the following script - it is running on the hourly CD server and 3-4 other environments OK
https://github.com/obrienlabs/onap-root/blob/master/cd.sh
Also verify you are running v1.6.10 of rancher with helm 2.3 on both the server and client - and kubectl 1.8+
thank you
/michael
Michael O'Brien
Issue was that my cd.sh was not accounting for dirs other than /root and /home/ubuntu - because of a cd ~/
Fixed the script - thanks Atul and we are good
Sen Shu
Hi.
now,I try to deploy onap on aws with using kubernetes.
then,is it able to install onap component to separated VM?
for example, aaf's one pod install to a 64gvm,
then install another aaf's pod to 32g VM.
and another question,namespace in kubernetes equall VM in HEAT? like aaf vm,aai vm..in diagram.
could you please tell me about that.
bestregards
sen
Michael O'Brien
Sen,
HI, there is a video at the top of this page where I bring up an R4 instance all the way to healthcheck on a single VM
https://wiki.onap.org/download/attachments/8227431/20171206_oom_e2e_aws_install_to_healthcheck.mp4?version=1&modificationDate=1512608899000&api=v2
Yes it is possible to run as many hosts as you like - this is the recommendation for a scalable/resilient system - there is a link to the SDNC initiative above - essentially you need to share the /dockerdata-nfs directory.
SDN-C Clustering on Kubernetes
3. Share the /dockerdata-nfs Folder between Kubernetes Nodes
For your question about affinity - yes you can assign pods to a specific host - but kubernetes will distribute the load automatically and handle any failures for you - but if you want to change this you can edit the yaml either on the checked out repo - or live in the Kubernetes console.
There is the global namespace example "onap" then the pod/component namespace "aai, aaf" - they combine as onap-aai - so the closest the HEAT VM model would be to equate the pod namespace - however a pod like onap-aai could have HA containers where individual containers like aai-resources have 2 copies split across hosts - also parts of a pod could be split like aai-resources on one host and aai-service on another. the global namespace allows you to bring up several deployments of ONAP on the same kubernetes cluster - separated by namespace prefix and port assignment (300xx, 310xxx for example)
Vidhu Shekhar Pandey
Hello,
I have installed ONAP on Kubernetes on a single host machine following the manual instructions
Now I am trying to run the vFW demo in my setup. I am facing an error when I am onboarding the vFW-vSINK VSP using the SDC portal. The error occurs during the asset creation process after the VSP is imported into the catalog. Here is the error, also attaching the screenshot
Error code SVC4614
Status code 400
invalid content Group type org.openecomp.groups.heat.HeatStack does not exist
To give a back ground of the processes followed:
I installed Kubernetes and Rancher. Kubernetes environment was created using Rancher portal and it showed healthy state.
onap_parameter.yaml file was edited according to my OpenStack setup running on a separate host.
Configuration was generated using
cd oom/kubernetes/config
./createConfig.sh -n onap
Helm APPS exported are
HELM_APPS=('mso'
'message-router'
'sdnc'
'vid'
'robot'
'portal'
'policy'
'appc'
'aai'
'sdc'
'log')
I was are able to bring up the ONAP containers individually one by one using the script
./createAll.bash -n onap -a XXX (for all Helm apps exported above )
I logged into the ONAP vnc portal and then logged on to SDC portal as designer (cs00008/demo123456!) to onboard the vFW demo VNF.
I created a new VLM which was checked in and submitted successfully
Then created the VSP vFW-vSINK and was able to upload the vFvSINK.zip yaml files, check in and submit the VSP successfully.
Importing this VSP in the catalog went fine but it was while creating the asset that I got this error.
Can someone help and suggest the possible cause?
Thanks,
Vidhu
Alexis de Talhouët
Hi Vidhu, which OOM branch have you used. You must use release-1.1.0 for now. Thanks
Vidhu Shekhar Pandey
Hi Alexis
Thanks for the information. Yes I am using release-1.1.0. In fact I re-created the PODS once again and the error got resolved. Now I have reached to a stage where I am able to create and distribute the vFW-vSINK services.
Regards,
Vidhu
pranjal sharma
Hi Vidhu,
how did you recreated the pods?
Thanks
Pranjal
Alan Chang
Dear all,
I use cd.sh to deploy ONAP in my environment. I always get 500 error code of the robot test of SDC (the same error http://jenkins.onap.info/job/oom-cd/690/console).
I have checked the logs in the sdc-be and got the following error.
Is there anyone know how to solve this problem? Looking forward your reply.
Blessings
Alan JW Chang
Alan Chang
Dear all,
I solve this problem by reinstall the whole system beginning with rancher. Thanks a lot.
Blessings
Alan
Michael O'Brien
Alan, Hi, there are a couple components that fail healthcheck for up to 15 min after the readiness pod marks them as up - the liveness probe needs to be adjusted and the teams need to provide a better /healthcheck url
Unfortunately you experienced this.
SDC-739 - Getting issue details... STATUS
As you can see from the graph - the failures are essentially random every hour - even though the CD server runs 3 and waits about 6 min
Kibana CD Dashboard
Beka Tsotsoria
SDC healthchecks fail constantly. Even in the CI build history there is a failure in every build output I checked. Also this graph shows different results now:
Kibana
Even if I wait more than 15 minutes, still no luck. What could be the workaround, any ideas?
UPDATE: I was finally able to get rid of SDC healthcheck failures by reinstalling only SDC several times:
However now I have following failures:
pranjal sharma
Hi Beka,
Are you able to resolve the above usecaseui-gui api health check issue. Since i am facing the same issue , it would be great if you have any workaround on this issue
Thanks
Pranjal
Beka Tsotsoria
Hello Pranjal,
No use usecaseui-gui still fails even in the jenkins: http://jenkins.onap.info/job/oom-cd/2123/console. I have not reached to the point where I will need these failing services, maybe for most of the use cases they are not needed at all.
Beka
David Perez Caparros
Hi,
Regarding the usecase-gui health check issue, try the following in robot container:
sed -i 's/usecaseui/usecase-ui/g' /var/opt/OpenECOMP_ETE/robot/testsuites/health-check.robot
That solved the issue for me.
David
pranjal sharma
Hello All,
I was able to create/deploy the vFirewall package (packet generator, sinc and firewall vnf)on openstack cloud.
But i couldnt able to login into any of vnf's vm.
After when i debug i see i didnt change the default public key with our local public key pair in the PACKET GENERATOR curl jason UI.
Now i am deploying the VNF again (same Vfirewall Package) on the openstack cloud, thought of giving our local public key in both pg and sinc json api's.
I have queries for clarifications :
- how can we create a VNF package manually/dynamically using SDC component (so that we have leverage of get into the VNF vm and access the capability of the same)
- And I want to implement the Service Function chaining for the deployed Vfirewall, please do let me know how to proceed with that.
PS: I have installed/Deployed ONAP using rancher on kubernetes (on openstack cloud platform) without DACE component so i haven't had leverage of using the Closed Loop Automation.
Any thoughts will be helpful for us.
Thanks,
Pranjal
user-acfda
Hi All,
Could you please let me know the significance of the CURL command as mentioned in the cd.sh ( the automated script )
The CURL query present in cd.sh ( the automated script to install ONAP pods ) is failing.
It has three parameters :
1. json file ( not sure whether we are supposed to use the same file as specified by ONAP community or we need to fill in our openstack details ). I have tried both.
2. a certification file named aaiapisimpledemoopenecomporg_20171003.crt ( which has NOT been attached alongwith the cd.sh script or specified anywhere else )
3. There is a änother header ( -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" ). If I use this header, the script is faling. I have removed this header, then PUT succeed but GET fails.
I am NOT sure of the significance of the below mentioned curl command in cd.sh file. I was just doing the vfirewall onboarding, that time I noticed that this CURL command is required.
Moreover, the robot scripts ( both ./demo-k8s.sh init_robot and ./demo-k8s.sh init ) are failing.
The init_robot is failing : though we have entered the test as password but the http is not taking it.
The init testcase is failing giving me 401 error for the authorization.
Could you please help! Thanks in advance!
cd.sh snippet :
echo "run partial vFW"
echo "curl with aai cert to cloud-region PUT"
curl -X PUT https://127.0.0.1:30233/aai/v11/cloud-infrastructure/cloud-regions/cloud-region/CloudOwner/RegionOne --data "@aai-cloud-region-put.json" -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" -H "X-TransactionId:jimmy-postman" -H "X-FromAppId:AAI" -H "Content-Type:application/json" -H "Accept:application/json" --cacert aaiapisimpledemoopenecomporg_20171003.crt -k
echo "get the cloud region back"
curl -X GET https://127.0.0.1:30233/aai/v11/cloud-infrastructure/cloud-regions/ -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" -H "X-TransactionId:jimmy-postman" -H "X-FromAppId:AAI" -H "Content-Type:application/json" -H "Accept:application/json" --cacert aaiapisimpledemoopenecomporg_20171003.crt -k
sudo chmod 777 /dockerdata-nfs/onap
./demo-k8s.sh init
Michael O'Brien
Hi, the curls are an AAI POST and GET on the cloud region - this is required as part of testing the vFW. For yourself it is optional until you need to test some use case like the vFirewall.
See the details on Running the ONAP Demos
For the aai cert - this cert is in the aai setup in your dockerdata-nfs , the json file is the body of the put - swap out your openstack tenantid
All of this is AAI specific, check the section on running AAI postman/curls in Vetted vFirewall Demo - Full draft how-to for F2F and ReadTheDocs and Tutorial: Verifying and Observing a deployed Service Instance and Verifying your ONAP Deployment and the AAI team dev page
If your init is failing then your cloud region and tenant are not set - check that you can read them in postman before running robot init (init_robot is only so you can see failures on the included web server - this should pass)
/michael
user-acfda
Hi Michael,
Thank you so much for the instant response. Glad to notice that all the queries have been addressed. But, still I am facing some errors:
Could you please help!
BR,
Michael O'Brien
Unauthorized means either the encoded user/pass is wrong - it is AAI:AAI - or you don't have the AAI cert (old or 2018 new one)
I added a cert to this page - it is in the demo and oom repos as well - also you can get it exported from firefox.
A post from amsterdam.onap.info - the first is from the put, the rest are from robot init
buntu@ip-172-31-92-101:~$ curl -X GET https://127.0.0.1:30233/aai/v11/cloud-infrastructure/cloud-regions/ -H "authorization: Basic TW9kZWxMb2FkZXI6TW9kZWxMb2FkZXI=" -H "X-TransactionId:jimmy-postman" -H "X-FromAppId:AAI" -H "Content-Type:application/json" -H "Accept:application/json" --cacert aaiapisimpledemoopenecomporg_20171003.crt -k
{"cloud-region":[{"cloud-owner":"CloudOwner","cloud-region-id":"RegionOne","sriov-automation":false,"resource-version":"1513572496664","relationship-list":{"relationship":[{"related-to":"complex","related-link":"/aai/v11/cloud-infrastructure/complexes/complex/clli1","relationship-data":[{"relationship-key":"complex.physical-location-id","relationship-value":"clli1"}]}]}},{"cloud-owner":"CloudOwner","cloud-region-id":"IAD","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572501497"},{"cloud-owner":"CloudOwner","cloud-region-id":"HKG","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572502146"},{"cloud-owner":"CloudOwner","cloud-region-id":"DFW","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572502465"},{"cloud-owner":"CloudOwner","cloud-region-id":"ORD","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572502756"},{"cloud-owner":"CloudOwner","cloud-region-id":"SYD","cloud-type":"SharedNode","owner-defined-type":"OwnerType","cloud-region-version":"v1","cloud-zone":"CloudZone","sriov-automation":false,"resource-version":"1513572501824"}]}
Michael O'Brien
fyi guys make sure to use aai v11 not v8 - for example
AAI-564 - Getting issue details... STATUS
user-acfda
Hi Michael,
But, I am sure that every person, needs to fill their OWN OPENSTACK details ( rather than using the default details as mentioned in the AAI json file ).
Reason being the init robot is still failing. And if the robot testcase has to pick our openstack details via onap-parameters.yaml file ( rather than the one's specified as defaults in the json file shared ) , then definitely in AAI json file, we should pass our openstack details only. Please advise!
2. Also, I think we need to create a separate region like ( RegionThree) etc with our system openstack details , to make new entries in AAI.
2. Also, as discussed, I have checked the integration robot file used by ONAP-robot, the AAI username and password was as mentioned below:
"/dockerdata-nfs/onap/robot/eteshare/config/integration_robot_properties.py"
GLOBAL_AAI_SERVER_PROTOCOL = "https"
GLOBAL_AAI_SERVER_PORT = "8443"
GLOBAL_AAI_USERNAME = "AAI"
GLOBAL_AAI_PASSWORD = "AAI"
3. I can notice that AAI logs are not getting updated , when we are running these CURL queries that enter data into AAI. Could you please let me know how to enable AAI logs?
The last update I could notice is of 12th dec in my system for AAI logs. But, from past few days , we are constantly trying to run CURL queries to enter data into AAI.
I have logged in to the AAI-SERVICES container but no AAI logs can be seen. Screenshot attached for your reference.
4. Moreover, aai-services is not present in dockerdata-nfs folder. Not sure why? Other sub-modules are present though.
user-acfda
Hi Michael,
Could you please let us know - how to add a new object ( cloud-owner )and a new region in AAI ?
The CURL query and the json file required to add a new object and a new region is needed.
In our steup of openstack , we have "admin" as a user/cloud-owner, we are trying to add our openstack details into AAI.
Also, we require the CURL query to add a new region , " say " RegionFour" as mentioned in the "cloud-region-id".
our openstack details:
{
"cloud-owner": "admin",
"cloud-region-id": "RegionFour",
"cloud-region-version": "2",
"cloud-type": "openstack",
"cloud-zone": "nova",
"owner-defined-type": "publicURL"
}
Original aai-cloud-region-put.json file:
cat aai-cloud-region-put.json
{
"cloud-owner": "CloudOwner",
"cloud-region-id": "RegionOne",
"cloud-region-version": "v2",
"cloud-type": "SharedNode",
"cloud-zone": "CloudZone",
"owner-defined-type": "OwnerType",
"tenants": {
"tenant": [{
"tenant-id": "{TENANT_ID}",
"tenant-name": "ecomp-dev"
}]
}
}
Best Regards,
Shubhra
Michael O'Brien
Use Kubernetes 1.8.6 for now - not the just released 1.9.0 - https://github.com/kubernetes/kubernetes/issues/57528
OOM-522 - Getting issue details... STATUS
Vaibhav Chopra
Yes, I found with K8 1.9 with Amsterdam release, Image secrets are getting failed.
Gary Wu
I set up two parallel OOM environments with docker 1.12.6, rancher 1.6.10, kubernetes 1.8.6, and helm 2.3.
On both of these, after the initial spin up, SDC would fail health checks with a 500 error even though all 5 SDC containers are running.
The SDC healthCheck API returns content as follows:
Once I restarted SDC via:
Then the SDC health check passes.
Is this a currently known issue?
Mohamed Aly ould Oumar
Hi,
This is a known issue for a long time, and they don't have a solution for
Please don't bother ur self, it won't work no matter what u do
I did installed ONAP more than 20 times, and even everything is running, it gives always the same 500 error.
They haven't fixed it and they don't admit it.
Michael O'Brien
Gary, Mohamed,
Hi, We appreciate your exercising of the system. You likely have run into a couple issues we currently have with SDC healthcheck and Kubernetes liveness in general. Please continue to raise any jiras on issues you encounter bringing up and running ONAP in general. SDC is currently the component with the least accurate healthcheck in Kubernetes or Heat.
Currently SDC passes healthcheck about 74% of the time - if we wait about 8 min after the readiness probe declares all the containers as ready 1/1. The issue with SDC (26%), SDNC(8%), APPC (1%) in general is that their exposed healthcheck urls do not always report the system up at the appropriate time.
The workaround is to delay healthcheck for now until the containers have run for a bit - 5-10 min - which is a normal warming of the system and caches in a production system.
On the CD system, SDC comes up eventually 2/3 of the time - our issue is helping OOM and the component teams adjust the healthcheck endpoints to report proper liveness (not just 200 or a subset of rest functionality) - You both are welcome to help us with these and any other of our outstanding issues - we are expanding the team.
OOM SDC healthcheck failure 26% of the time even with 3 runs and 8 min wait state
SDC-739 - Getting issue details... STATUS
The following is in progress and can also be reviewed
SDC-715 - Getting issue details... STATUS
Related SDC issue in HEAT
SDC-451 - Getting issue details... STATUS
Any assistance with the following is appreciated.
OOM-496 - Getting issue details... STATUS
thank you
Michael O'Brien
Gary Wu
In my case, the SDC never passed health checks even after waiting a couple of hours after everything is "Running" in kubectl. They passed health checks only after I restarted SDC. Which JIRA issue do you think this info is applicable to?
Rahul Sharma
Gary Wu: For me, restarting SDC helped fix the Health-check. However when launching SDC UI, it failed to open (even though Health check was now passing).
For SDC-UI to work:
.
/deleteAll
.
bash
-n onap;
).
/createAll
.
bash
-n onap
Gary Wu
For this, I had to fix /etc/hosts in vnc-portal to change the SDC IP addresses since they change once you restart SDC.
However, I think I'm going to just re-deploy the entire ONAP until SDC passes the health check since I don't know what other things become out-of-date if SDC is restarted on by itself.
Xiaobo Chen
I also met the same SDC problem after deployed ONAP. The health check still did not pass even I restart sdc(./deleteAll.bash -n onap -a sdc and ./createAll.bash -n onap -a sdc) for 10 minutes. It seems all SDC components were running up except TITAN. I checked the log in container sdc-be: /var/lib/jetty/logs/SDC/SDC-BE/error.log.3, found Tian graph failed to initialize with an execption thrown com.thinkaurelius.titan.core.TitanException. Any sugguestion about this why Tian can not work?
{
"sdcVersion": "1.1.0",
"siteMode": "unknown",
"componentsInfo": [
{
"healthCheckComponent": "BE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
},
{
"healthCheckComponent": "TITAN",
"healthCheckStatus": "DOWN",
"description": "Titan graph is down"
},
{
"healthCheckComponent": "DE",
"healthCheckStatus": "UP",
"description": "OK"
},
{
"healthCheckComponent": "CASSANDRA",
"healthCheckStatus": "UP",
"description": "OK"
},
{
"healthCheckComponent": "ON_BOARDING",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK",
"componentsInfo": [
{
"healthCheckComponent": "ZU",
"healthCheckStatus": "UP",
"version": "0.2.0",
"description": "OK"
},
{
"healthCheckComponent": "BE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
},
{
"healthCheckComponent": "CAS",
"healthCheckStatus": "UP",
"version": "2.1.17",
"description": "OK"
},
{
"healthCheckComponent": "FE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
}
]
},
{
"healthCheckComponent": "FE",
"healthCheckStatus": "UP",
"version": "1.1.0",
"description": "OK"
}
]
2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<** createGraph started **>
2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<** open graph with /var/lib/jetty/config/catalog-be/titan.properties started>
2018-01-08T09:59:09.532Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<openGraph : try to load file /var/lib/jetty/config/catalog-be/titan.properties>
2018-01-08T09:59:10.719Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool>
2018-01-08T09:59:10.726Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc>
2018-01-08T09:59:15.580Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool>
2018-01-08T09:59:15.581Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc>
2018-01-08T09:59:16.467Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: 10.42.243.240>
2018-01-08T09:59:16.468Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<RemoveHost: sdc-cs.onap-sdc>
2018-01-08T09:59:23.938Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.t.g.c.GraphDatabaseConfiguration||ActivityType=<?>, Desc=<Set default timestamp provider MICRO>
2018-01-08T09:59:23.946Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.t.g.c.GraphDatabaseConfiguration||ActivityType=<?>, Desc=<Generated unique-instance-id=0a2a0d4d395-sdc-be-1187942207-21tfw1>
2018-01-08T09:59:23.956Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool>
2018-01-08T09:59:23.956Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc>
2018-01-08T09:59:24.052Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.ConnectionPoolMBeanManager||ActivityType=<?>, Desc=<Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=KeyspaceTitanConnectionPool,ServiceType=connectionpool>
2018-01-08T09:59:24.052Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: sdc-cs.onap-sdc>
2018-01-08T09:59:24.153Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<AddHost: 10.42.243.240>
2018-01-08T09:59:24.153Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.n.a.c.i.CountingConnectionPoolMonitor||ActivityType=<?>, Desc=<RemoveHost: sdc-cs.onap-sdc>
2018-01-08T09:59:24.164Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||c.t.titan.diskstorage.Backend||ActivityType=<?>, Desc=<Initiated backend operations thread pool of size 96>
2018-01-08T09:59:34.186Z|||||main|||SDC-BE||||||||INFO||||10.42.13.77||o.o.s.be.dao.titan.TitanGraphClient||ActivityType=<?>, Desc=<createGraph : failed to open Titan graph with configuration file: /var/lib/jetty/config/catalog-be/titan.properties>
com.thinkaurelius.titan.core.TitanException: Could not initialize backend
at com.thinkaurelius.titan.diskstorage.Backend.initialize(Backend.java:301) ~[titan-core-1.0.0.jar:na]
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1806) ~[titan-core-1.0.0.jar:na]
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:123) ~[titan-core-1.0.0.jar:na]
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94) ~[titan-core-1.0.0.jar:na]
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:62) ~[titan-core-1.0.0.jar:na]
at org.openecomp.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:256) [catalog-dao-1.1.0.jar:na]
at org.openecomp.sdc.be.dao.titan.TitanGraphClient.createGraph(TitanGraphClient.java:207) [catalog-dao-1.1.0.jar:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_141]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_141]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:366) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:311) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:134) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:408) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1575) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:207) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1131) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1059) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:835) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:741) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:467) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1128) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1022) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:512) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) [spring-beans-4.3.4.RELEASE.jar:4.3.4.RELEASE]
Xiaobo Chen
I solved this problem by reinstalling SDC component for several times. To make web working, I have to change /etc/hosts in PORTAL VNC.
ramki krishnan
From what I have seen so far, health check seems to succeed immediately after containers are ready provided the worker node has enough CPU/Memory. In my case, the worker node had 48 vCPUs and 64GB RAM.
Gary Wu
What is the current status on DCAE? Any specific instructions for starting up DCAE?
Gary Wu
Also, looks like if we use DEMO_ARTIFACTS_VERSION: "1.1.1" then multiple containers fail to start?
Syed Atif Husain
I have deployed onap oom using cd.sh but cant get to portal
2 onap-portal pods are failing, logs say
portalapps" in pod "portalapps-1783099045-zkfg8" is waiting to start: trying and failing to pull image
"vnc-portal" in pod "vnc-portal-3680188324-kzt7x" is waiting to start: PodInitializing
I tried deleting and creating but it did not help. Pls advise
Rahul Sharma
Syed Atif Husain: For PortalApps, looks like your system was unable to pull the image. One way to work around is to manually pull the image and also change the pullPolicy from Always to IfNotPresent (under $OOM_HOME/kubernetes/portal/values.yaml - see here).
For vnc-portal, the Pod would stay in 'PodInitializing' until the portalapps starts up, as it's defined as init-container dependency for vnc-portal (see here).
Syed Atif Husain
Thanks Rahul Sharma I tried that, but manual pull is failing
# docker pull nexus3.onap.org:10001/onap/portal-apps:v1.3.0
v1.3.0: Pulling from onap/portal-apps
30064267e5b8: Already exists
a771fb3918f8: Already exists
e726f32f5234: Already exists
f017a45e77ce: Already exists
a0726cff2538: Already exists
0edfd34a7120: Already exists
60f8916f4ad6: Already exists
d705b1b28428: Already exists
f60cc3eb4fd3: Already exists
d3f1c4df222e: Already exists
6ae6daeaff5c: Already exists
cc77e52e0609: Already exists
a5524884a276: Extracting [==================================================>] 6.893 MB/6.893 MB
964a83c06e36: Download complete
a0292615b06b: Download complete
e8af69e9e3e4: Download complete
d7a3048354e6: Download complete
failed to register layer: open /var/lib/docker/aufs/layers/d1ce30bb68ec6a15ab6eb8d4b3593cd36a89c99f8d484dfe8653d23e298a5093: no such file or directory
Rahul Sharma
Looks like the image was pulled but extraction is having issues. Not sure what the reason is - do you have enough space on your system?
Syed Atif Husain
looks like a docker issue, I am able to pull the image on the other vm
Brian Freeman
I needed to restart the sdnc dgbuilder container after loading DGs via the mulitple_dgload.sh and k8 started a new instance before I could do a docker start. What is the mechanism to restart a container to pick up a change made on persistant storage for the container ?
Alexis de Talhouët
either through the GUI, in the onap-sdnc namespace, under pod, delete the pod. K8S will automatically restart it. Either through cli
kubectl --namespace=onap-sdnc delete pods <pod-name>
Make sure to delete the pod, not the deployment.
Brian Freeman
Is that the same as a docker stop , docker start ? delete seems like it would be more like a docker rm ?
Alexis de Talhouët
It's exactly a docker rm. With K8S you never stop start a container, you rm and re-create it (this is done automatically by K8S when a pod is deleted). So if the changed data is persisted, then it's ok to delete the pod, hence delete the container, because the new one will pick up the new data.
K8S deployment manifest defines the contract for the pod, which in the end is the container. Deleting the pod does delete the container, and kubernetes, based on the deployment manifest, will re-create it. Hope it clarifies things.
Brian Freeman
It does clarify things but we will have to make sure the things we did in Docker like edit a file inside the container and do a stop/start or restart can be done in K8. This is actually a problem in debugging where the project teams will have to make changes to support debugging in K8. We had setup shared data in the container configuration so that we can edit values and then delete the pod to pick up the new values. This will be a tedious pain.
Alexis de Talhouët
At the end of the day, a docker stop docker start is just a lazy way to restart process(es) running within the container. If the proccess(es) to restart are not tied to the docker liveliness (e.g PID 1), then instead of stopping and starting the container, we could simply stop and start the process within the container. I'm not too scared about this being a pain to debug, but we will see I doubt I'm familliar enough with all of them (knowing they are around 80 containers as of today for the whole ONAP).
Brian Freeman
I think we need to add a volume link (-v in docker) for each app that we might need to modify configuration and do a restart - dgbuilder for instance has a script to bulk load DG's into the flows.json file but this file would be lost whenever the dgbuilder/node-red pod is restarted right now. This would not happen in regular docker on a stop/start or restart.
Brian Freeman
We need take a running instance of ONAP using OOM and change each application in some normal way and then restart to confirm that on a restart we aren't losing data. This is something we did in the HEAT/Docker/DockerCompose environment to make sure all the persistant storage settings were correct. Since k8 does a recreate instead of a restart we may lose file based configuration data. I would look a : add vFW netconf mount to APPC, add a flow to DG builder, create and distribute a model, instantiate a vFW , execute a closed loop policy on the vFW and vDNS ; then restart all containers and confirm that the data created is still there and the same control loops still run. I suspect right now with an OOM installation that parts might not survive a docker stop and K8 re-create of the container (since we cant do a docker start)
Andrew Fenner
Hi,
I'm new to Kubernates and to OOM but so the following question could have a obvious answer that I've completely missed.
Is there a reason not to use the following commands to expose the K8s containers so that you don't have to log on via the VNC sever which is just a pain.
kubectl expose services portalapps --type=LoadBalancer --port 8989 --target-port=8080 --name=frontend -n onap-portal
kubectl expose services sdc-fe --type=LoadBalancer --port 8181 --target-port=8181 --name=frontend -n onap-sdc
kubectl expose services vid-server --type=LoadBalancer --port 8080 --target-port=8080 --name=frontend -n onap-vid
This exposed the portal, Vid and SDC so then the K8S services could be used directly. Then the ip address to use can just be found using
kubectl get services --all-namespaces=true | grep -i frontend or you can assign the IP address using --external-ip=w.x.y.z
Then I just updated the hosts file as "normal"
Thanks
/Andrew
Michael O'Brien
Good question, I guess we live with port mapping requiring the vnc-portal so we can run multiple environments on the same host each with 30xxx, 31xxx etc.. but in reality most of us by default run one set of ONAP containers. Myself when I work in postman I use the 30xxx ports except for using the SDC gui - in the vnc-portal.
I think we need a JIRA to run ONAP in affective single port mapping config where 8989 for example maps to 8989 outside the namespace and not 30211 - for ease of development.
OOM-562 - Getting issue details... STATUS
Brian Freeman
How would I add
/opt/onap/sdnc/dgbuilder/releases
as a directory that is mapped from the host file system so that updates to the flows.json file in /opt/onap/sdnc/dgbuilder/releases/sndc1.0/flows/flows.json would persist across restarts/recreates of the container ?
alternatively is there a way to temporarily set the restart policy to never so that we can manually update flows.json and then restart the existing container ?
Alexis de Talhouët
Brian,
To do so, update the sdnc dgbuilder deployment file, to add the following
This mean you will mount the volume identified by the name to the specified mountPath
The name here has the be the same as the one specified above, it serves as ID to correlated the mounted folder.
The hostpath implies here that you have created on the host the folder /dockerdata-nfs/{{ .Values.nsPrefix }}/sdnc/dgbuilder/releases (where {{ .Values.nsPrefix }} is onap) and put the data you whish to persit in there.
With those addition, here is how the sdnc dgbuilder deployment would look like
Brian Freeman
I made the changes to: /opt/onap/oom/kubernetes/sdnc/templates/dgbuilder-deployment.yaml
I created the release directory: /dockerdata-nfs/onap/sdnc/dgbuilder/releases
I stopped the current container but the restarted container didn't seem to write to the dockerdata-nfs directory ?
Do I need to redeploy the dgbuilder via rancher or kubectl somehow ?
Brian Freeman
kubectl -n onap-sdnc edit deployment/sdnc-dgbuilder
caused a redeployment but dgbuilder didn't like the hostPath since files it was expecting aren't on the host until the dgbuilder image is pulled. Not sure if its a permissions problem on the host directories.
Should we be using something more like EmptyDir{} (but that doesn't seem to take a path) ?
Alexis de Talhouët
Brian, I forget to mentioned the data has to be put in the persisted directory in the host first. Mounting the host directory will overwrite the directory in the container. So the first time, all the data is in the persisted directory (in the host). Then you start the pod, the persisted data will be mounted in the container. From there, you can either edit the persisted data from the server or from the pod itself.
Brian Freeman
OK that worked.
Michael O'Brien
Brian,
Hi again, Very Good idea. A lot of the applications need a way to either expose config (log, db config) into the container or push data out (logs) to a NFS mapped share on the host. My current in-progress understanding of Kubernetes is that it wraps docker very closely and adds on top of docker where appropriate. Many of the docker commands exec, log, cp are the same as we have seen. For static persistent volumes there are already some defined in the yamls using volumeMounts: and volumes:. We also have dynamic volumes (specific to the undercloud VIM) in the SDNC clustering poc - https://gerrit.onap.org/r/#/c/25467/23. We still need places where volume mounts can be done to the same directory that already has an emptyDir stream into Filebeat (which has a volume under the covers) - see
For example the following has a patch that exposes a dir into the container just like a docker volume or a volume in docker-compose - the issue here is mixing emptyDir (exposing dirs between containers) and exposing dirs outside to the FS/NFS
https://jira.onap.org/browse/LOG-52
This is only one way to do a static PV in K8S
https://jira.onap.org/secure/attachment/10436/LOG-50-expose_mso_logs.patch
I have used these existing volumes that expose the logback.xml file for example to move files into a container like the MSO app server in kubernetes from /dockerdata-nfs instead of using kubectl cp.
I myself will also look into PV's to replace the mounts in the ELK stack for the CD job - that is being migrated from docker-compose to Kubernetes and for the logging RI containers.
Going through this documentation now - to get more familiar with the different PV options - https://kubernetes.io/docs/concepts/storage/persistent-volumes/
For the question about whether we can hold off on container restarts to be able to manually update a json exposed into the container. The model of Kubernetes auto-scaling is stateless. When I push pods without affinity rules - the containers randomly get assigned to any host and bringing down a container either manually or because of a health initiated trigger is usually out of the control of any OSS outside of Kubernetes - but there are callbacks. Rancher and Kubeadm for example are northbound to Kubernetes and act as VIM's and in the same way that a spot VM going down in EC2 gives a 2 min warning - I would expect we could register as listener to to at least a pre-stop of a container - even though it is a second or 2. I would also like to verify this and document all of this on our K8S devops page - all good questions that we need definitely need an answer for.
/michael
Brian Freeman
I had to modify cd.sh to change the parameters to deleteAll.sh.
#oom/kubernetes/oneclick/deleteAll.bash -n onap -y yes
oom/kubernetes/oneclick/deleteAll.bash -n onap
I was getting an error message since "-y" wasnt an allowed argument. Is cd.sh checked into onap.gerrit.org somewhere so we can reference that instead of the copy on the wiki ? Maybe I'm just looking in the wrong spot.
Michael O'Brien
Brian, hi, you are using amsterdam - the change done by Munir has not been ported from master.
I retrofitted the CD script to fix the jenkins job and patched github to align with the new default prompt behaviour of deleteAll
yes, ideally all the scripts northbound of deleteAll should be in onap - I will move the cd.sh script into a ci/cd folder in OOM or in demo - as it clones oom inside.
Also, I'll put in an if statement on the delete special to amsterdam to not require the -y option
OOM-528 - Getting issue details... STATUS
Michael O'Brien
Actually I think this will be an issue for anyone master/amsterdam that has cloned before OOM-528 - essentially we need a migration plan
In my case I brought up an older image of master before the change - and the cd.sh script with the -y option fails (because it is not resilient ) on -y
Therefore unfortunately anyone on an older branch either needs to do a git pull or edit cd.sh one-time to remove the -y - after that you are ok and effectively upgraded to OOM-528 - Getting issue details... STATUS
I will add a migration line to the last onap-discuss on this
https://lists.onap.org/pipermail/onap-discuss/2018-January/007198.html
hope this helps
thank you
/michael
user-f297f
Good Morning,
I am new to ONAP and yesterday I did setup ONAP on a permanent AWS m4 large instance which uses Dynamic public IP. Today, I removed existing ONAP environment and recreated new environment in Rancher. After adding the environment when I am trying to add host, rancher is not detecting new public IP. In the register command rancher is still referring to yesterday's public IP which is not valid.
Please let me know the steps required to restart ONAP on a Dynamic IP based server which needs to be shutdown and restarted on daily basis.
Thank you in advance
Best Regards,
Nagaraj
user-f297f
I was able to restart ONAP after restart of a Dynamic IP based server by doing following :
Before Shutdown :
a) Remove Host and ONAP environment from Rancher.
b) Remove .kube/config file before shutting down the server.
After Reboot :
c) Perform the steps required for registering the server with new IP on Rancher i.e.,by adding ONAP environment and host with new IP in Rancher,
d) Register server in Rancher by executing the command for registration provided in Add Host Page.
e) Generate Config in CLI page of ONAP Environment in Rancher and copy the content to .kube/config file on server.
f) Run command "cd.sh -b amsterdam", to drop and recreate namespace, containers and pods in K8s.
Please let me know if above approach is correct or is there any better way of starting ONAP on restart of a server with Dynamic IP.
Best Regards,
Nagaraj
Michael O'Brien
Nagaraja,
Hi, that is a common issue with Rancher - it needs a static IP or DNS name.
You have a couple workarounds, elastic IP, elastic IP + domain name, edit the host registration URL in rancher, or docker stop/rm rancher and rerun it
I opt for elastic IP + DNS entry - in my case I register onap.info in Route53, create an EIP in the EC2 console, then associate the EIP with the labelled instance ID network ID before bringing up rancher/kubernetes/helm.
This will also allow you to save the AMI and bring it up later with a 20 min delay until it is fully functional - provided you keep the EIP and domain A record.
this how the CD system works - see the following but do not touch anything it is used for deployment testing for the first 57 min of the hour. http://amsterdam.onap.info:8880/
ONAP on Kubernetes on Amazon EC2#AllocateanEIPstaticpublicIP(one-time)
ONAP on Kubernetes on Amazon EC2#CreateaRoute53RecordSet-TypeA(one-time)
ONAP on Kubernetes on Amazon EC2#AssociateEIPwithEC2Instance
But I recommend otherwise edit 8880/admin/settings and enter the new host registration URL/IP/DNS-name
let me know
/michael
Michael O'Brien
Sorry I was answering your first question from memory this morning - didn't realize you added a 2nd comment with your workaround - yes that is OK but we agree - a lot of work. What you may do - and I will try is a very small static IP for the host a 4G machine that does not run the ONAP pods - they will all have affinity to a 2nd 64G host that has a dynamic IP - but the server must be static.
Another workaround that I have not tried is a automated host authentication via REST or CLI - this I need to research.
But still the easier way is to bring up the EC2 VM with an EIP (it will cost $2 per month when not used though) - You should have an allocation of 5 on your AWS account - I asked for 10.
/michael
user-f297f
Thank you Michael
I will try the EC2 VM with an EIP option.
Best Regards,
Nagaraj
Hong Guan
Hi Michael ,
We ran prepull_docker.sh on 4 different k8s nodes at the same time, we got 75,78,80 and 81 images (docker images | wc -l), we verified the pulling process using (ps -ef | grep docker | grep pull), all pulling processes were completed. Do you know why we got different number images?
Thanks,
Hong
Michael O'Brien
Hong,
Yes, weird - intermittent errors usually mean the underlying cloud provider, I sometimes get pull errors and even timeouts - used to get them on heat as well. There are issues with nexus3 servers periodically due to load, upgrades and I have heard about a serious regional issue with mirrors. I do not know the cloud provider that these servers run on - the issue may be there. The script is pretty simple - it greps all the values.yaml files for docker names and images - there were issues where it parsed incorrectly and tried to pull just the image name or just the image version - but these were fixed - hopefully no more issues with the sh script.
There also may be issues with docker itself with 80 parallel pulls - we likely should add a -serial flag - to pull in sequence - it would be less performant.
you can do the following on a clean system to see the parallel pulls in progress and/or count them
ps -ef | grep docker | grep pull | wc -l
In the end there should be no issues because anything not pulled in the prepull will just get pulled when the docker containers are run via kubectl - they will just start slower the first time.
please note that there are a couple "huge" images on the order of 1-2G one of them for SDNC - and i have lately seen issues bringing up SDNC on a clean system - required a ./deleteAll.bash -n onap -a sdnc and re ./createAll.
Another possibility is that docker is optimizing or rearranging the pulls and running into issues depending on the order.
Another issue is that the 4 different servers have different image sets - as the docker images | wc -l may be picking up server or client images only present on one or more of the nodes - if you look at a cluster of 4 servers - I have one - then the master has a lot more images than the other 4 and the other 3 clients usually run different combinations of the 6 kubernetes servers - for what reason I am still looking at - before you even bring up the onap containers.
lets watch this - there is enough writing here to raise a JIRA - which I will likely do.
thank you for your dilligence
/michael
James Forsyth
Michael O'Brien - I am trying to bring up vid, robot, and aai w/ the latest oom, seeing this error on several aai pods:
Error: failed to start container "filebeat-onap-aai-resources": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/dockerdata-nfs/onap/log/filebeat/logback/filebeat.yml\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/2234aef661aa61185f7fb8fd694ec59d29f82c2478d9de1beee0a282e4af4936\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/aufs/mnt/2234aef661aa61185f7fb8fd694ec59d29f82c2478d9de1beee0a282e4af4936/usr/share/filebeat/filebeat.yml\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}
The config job seems to have failed with an error but it did create the files under /dockerdata-nfs/onap
onap config 0/1 Error 0 33m
however is this supposed to be a dir?
root@z800-kube:/dockerdata-nfs/onap/log/filebeat/logback# ls -l
total 4
drwxr-xr-x 2 root root 4096 Jan 11 17:25 filebeat.yml
Michael O'Brien
Jimmy,
Hi, good question and thank you for all your help with OOM code/config/reviews.
That particular error "not a directory" is a sort of red herring - it means 2 things, the container is not finished initializing (the PVs and volume mounts are not ready yet - it will go away after the pod tree is stable - or your config pod had an issue - not recoverable without a delete/purge. These errors occur on all pods for a while until the hierarchy of dependent pods are up and each one goes through the init cycle - however if you see these after the normal 7-15 min startup time and they do not pass config - then you likely have an issue with the config pod pushing all the /dockerdata-nfs files (this is being removed and refactored as we speak) - due to missing config in setenv.bash and onap-parameters.yaml (it must be copied to oom/kubernetes/config)
Also that many failures usually means a config pod issue - or a full HD or RAM issue (if you have over 80G HD (you need 100G over time) and you have over 51G ram - then it is a config pod issue.
How to avoid this. See the cd.sh script attached and linked to at the top of the page - this is used to provision a system automatically on the CD servers we run the hourly jenkins job on - the script can also be used by developers wishing a full refresh of their environment (delete, re-pull, config up, pods up, run healthcheck...)
https://github.com/obrienlabs/onap-root/blob/master/cd.sh
AutomatedInstallation
If you are running the system manually - use the cd.sh script or the manual instructions at the top in detail - the usual config issue is forgetting to configure onap-parameters.yaml (you will know this by checking the config pod status). The second usual issue is failing to run setenv.sh to pickup the docker and other env variables - this will also fail the config container.
kubectl get pods --all-namespaces -a
it must say
onap config 0/1 Completed 0 1m
do the following to see any errors - usually a missing $variable set
kubectl -namespace onap logs -f config
as of an hour ago these were the failing components - no AAI, vid or robot
As an additional reference you can refer to the running master CD job - for the times when you might think it is actually failing - not just locally.
http://jenkins.onap.info/job/oom-cd/1109/console
Also AAI has not been failing healthcheck for at least the last 7 days - actually I think since the first week of Dec 2017 - once - it is one of the most stable ONAP components
http://kibana.onap.info:5601
Let me know if this fixes your issues - if your config pod is busted - then you will need to deleteAll pods, purge the config pod and rerun setenv, config pod and createAll - see the script for the exact details
If not we can triage further
thank you
/michael
James Forsyth
Thanks Michael O'Brien, I needed to refresh the config pod and once i got "completed" I was able to get aai and several others going! Thanks for your help!
Andrew Fenner
This is a pretty basic question. I've been having some trouble with getting SDNC running (still troubleshooting) but as then looking at the readiness docker image and understanding how it worked.
I think I understood most of it but I couldn't figure out how the value of "K8S_CONFIG_B64" environment variable was been set as the seems to be some "magic" for this and I was hoping somebody could give me a hint.
Thanks
/Andrew
Michael O'Brien
Andrew, hi, just to cover off SDNC - since clustering was put in - the images have increased in number and size - there may be a timeout issue. So on a completely clean VM you may need to delete and create -a sdnc to get around this issue that only appears on slow machines (those with less than 16 cores)
if that was your issue - otherwise we need a jira
Alain Drolet
Last December (2017) I managed to deploy an almost-amsterdam version of ONAP using oom on a single Ubuntu VM.
I used a manual list of commands (cd.sh was not available at the time) as explained on this page.
The installation used:
Docker 1.12,
Rancher server 1.6.10,
Kubernetes 1.8.6,
Helm 2.3.0
Most container came up. Over time (weeks) things degraded.
Back from the holidays I tried to reinstall (this time I'm aiming for the amsterdam branch) from scratch and had issue with Rancher.
To remove the possibility that my host was corrupted in some way,
today I used a brand new Ubuntu 16.04.4 VM I tried to create the same environment for ONAP.
I executed the commands in
oom_rancher_setup_1.sh
.I executed these by hand so that I can better control the docker installation and the usermod command.
I ended up with the same problem I had on my old VM, yesterday.
The problem is has follow:
In the Rancher Environment GUI I created a Kubernetes environment.
Once I made it the default the State became "Unhealthy".
Rancher won't tell you why!
Then I tried anyway to add a host.
When running the command:
The agent started to complain that it could not connect to the server.
SSL certification is failing.
I get an output like this:
The Unhealthy state might be due to the web client having the same communication issue.
This does not appear to be an ONAP specific issue, since I'm failing in one of the first installation step
which is to get a Rancher server and agent working together.
This behavior was only observed upon my return on January 9th.
In December I had no such issue.
Could a certificate be expired?
Where are these certificates? (In the docker images I suspect)
Am I the only one with this error?
Any help will be appreciated.
Thank you
Alain
Michael O'Brien
Alain,
Hi, welcome. Also very detailed and complete environment description - appreciated.
I am extremely busy still - but your post stood out. I will return in more detail on the weekend.
For now, yes I also have had issues connecting the client - usually this involved a non static IP. for example if I saved an AMI on AWS and got a different EIP. There are several fixes for that one - use a static EIP and/or assign a domain name to it. Also you can retrofit your server - I turned off security on the CD poc for a couple days
http://amsterdam.onap.info:8880/admin/settings
change the following (not this server! just an example) in the "something else" textbox
I would hope this would work - but only if your 10.182.40.40 was changed from the original IP
Host Registration URL
What base URL should hosts use to connect to the Rancher API?
http://amsterdam.onap.info:8880
Don't include
/v1
or any other path, but if you are doing SSL termination in front of Rancher, be sure to usehttps://
.Alain Drolet
Thank you for looking into this.
My host is a plain VMWare VM, with a fixed IP. Nothing fancy.
I'm currently doing deeper debging of the SSL connection. I found that the rancher-agent fails in its run.sh script
on the line with
For what I understand (not confirmed) at this point is that the VM should have data provided by the rancher server at:
/var/run/rancher
. There should be various sub-dirs, some with cert data.In the past I saw some files there but on my new host
/var/run/rancher
is empty!I think this is where server and sagent share cert data (anyone knows this?).
I'll keep the community posted If I find something interesting.
Michael O'Brien
Ok good, I am running a VMware 14 Workstation VM on windows at home and Fusion VMs on my macs - will look into it there as well.
/michael
Alain Drolet
Update:
I reproduced the same SSL issues using a small vagrant VM (2 CPU, 2GB).
The VagrantFile uses:
config.vm.box = "ubuntu/xenial64"
From this VM I ran the following commands:
I also tried rancher server v1.6.11. Same issues were seen.
Alain Drolet
Found it!
Of course it was a trivial mistake (that costed me a lot).
Unless you chose to use HTTPS and go through a lot of custom ssl configurations (as documented on the Rancher site) you should not use HTTPS.
Looking again at the examples on this page only HTTP is used.
I guess Chrome added HTTPS by default and send me on this madness chase!
When connecting to the rancher server using a browser, MAKE SURE to use a HTTP URL.
E.g. :
http://<your k8s server host>:8880
Then
When adding a host for the first time, you will be presented with a page asking to confirm the "Host Registration URL".
This should be the same as the URL you used in your browser.
In any case make sure it is HTTP, NOT HTTPS.
The command you will get to add the host in step 5 should be of the form:
Since the agent is instructed to connect to the server using http, you should be fine.
Moral of the story, beware of browser trying to help you too much!
Now I can have a nice weekend,
and move on to figuring real ONAP issue!
:-)
Michael O'Brien
Alain,
Nice, good heads up on the http vs https host registration issue to watch for
thank you
/michael
Pavan Gupta
Hi Alain,
Could you get through the issue? I have also manually installed the components, but unable to get ONAP up running. It would be helpful if you can list the steps taken to install and run onap.
Michael O'Brien
Pavan,
Hi, welcome.
Mostly automated undercloud, helm, kubernetes, oom - AutomatedInstallation
Manual procedures QuickstartInstallation
/michael
Alain Drolet
Hi Pavan
I could post my notes.
They would look like a summary of information already on this page.
If some think it would be useful, I could do so.
In order to avoid too much redundancy on this page, could you tell us a bit more about where you have issues.
Then maybe I could post a subset of my notes around this area.
Basically I see this installation being made of 2 major steps:
After this step you should be able to go to the Rancher Web UI and see the rancher/kubernetes dockers instances and pod running.
This means running the oom_rancher_setup_1.sh, which in my case I ran manually.
Followed by some interaction in Rancher's web UI to create a k8s env, and add a host.
What do you see running or not?
Radhika Kaslikar
Hi All,
The SLI-API module for SDNC is missing from the below link, which health check makes use of.
Link to check the SLI-API : <hostIP>:<port of sdnc>/apidoc/explorer/index.html
The SLI-API module for APPC is present at the below mentioned location and the health check for it is passed.
Link to check the SLI-API : <hostIP>:<port of APPC>/apidoc/explorer/index.html
username : admin
password for both SDNC/APPC : Kp8bJ4SXszM0WXlhak3eHlcse2gAw84vaoGGmJvUy2U
The below is the snippet for SDNC and APPC health check report.
Kindly let us know how to resolve this issue.
How to make SLI-API available for SDNC, as the health check is failing for the same.
Snippet for the SLI-API missing from SDNC API doc page:
Snippet for the SLI-API PRESENT from APPC API doc page:
Michael O'Brien
Radhika, good triage - I would raise a JIRA with SDNC for this https://jira.onap.org/secure/RapidBoard.jspa?rapidView=39&view=planning
/michael
user-acfda
Hi Michael,
Thank you. We both are the same team.
You can find more supporting debugging for the same SDNC SLI-API in the attached document.
After running the installSdncDb.sh script , and after logging into the SDNC container and after logging into the SDNC database, we found that the "VLAN_ID_POOL" table does not exists, though the database was showing that the mentioned table exists. It was present in stale format.
<opt/sdnc/features/sdnc-sli# cat /opt/onap/sdnc/bin/startODL.sh
${SDNC_HOME}/bin/installSdncDb.sh
Table "VLAN_ID_POOL" present in the sdnctl database:
But, upon describing the table, it shows error.
Solution : We removed the SDNC stale tables from database location and restarted the SDNC pod, it resolved the above mentioned error.
Best Regards,
Shubhra
Syed Atif Husain
Is there a wiki or video for openstack setup needed for onap oom to openstack connectivity?
I am struggling connecting oom vm to openstack vm and setting correct values in onap_parameters.yaml
~atif
Rahul Sharma
Hey Syed Atif Husain,
I followed this page and it helped sort out the issues. See the comments section for details on the onap parameters.
Syed Atif Husain
Rahul Sharma I followed the steps on the link above but I am facing issues related to connectivity to Openstack. I guess I am missing some basic setup in my openstack.
I have created a network and subnet on openstack. I am using there ids in the param file for OPENSTACK_OAM_NETWORK_ID and OPENSTACK_OAM_SUBNET_ID respectively. What should I use for OPENSTACK_PUBLIC_NET_ID? Do I have to create another network? How do I ensure my ONAP VM is able to connect to the Openstack VM? [I have installed ONAP OOM on one Azure VM and Openstack on another VM].
Any pointers to these are highly appreciated.
Rahul Sharma
Syed Atif Husain: OPENSTACK_PUBLIC_NET_ID should be one of the networks on your Openstack that's publicly accessible. One of the public IP assigned to your vFW_x_VNF (x = SINC or PG) would belong to this network.
You don't need to create other networks: unprotected_private_net_id (zdfw1fwl01_unprotected), unprotected_private_subnet_id(zdfw1fwl01_unprotected_sub), protected_private_net_id(zdfw1fwl01_protected), protected_private_subnet_id(zdfw1fwl01_protected_sub) would be created as part of vFW_SINC stack deployment.
The "pub_key" attribute will be used to communicate with the VM on Openstack.
Note: the values sent in the SDNC-Preload step are used to create the stack; so if you want to update something, you can do it then.
Also, when I tested, my ONAP was running on Openstack; running ONAP on Azure should be similar considering that MultiVIM should take care of different platforms underneath but you can verify in that area. Have a look at the VF instantiation flow for Release 1.1 here
Syed Atif Husain
Rahul Sharma Hi I tried the alternate steps on "ONAP on Kubernetes on Rancher in OpenStack" but I am getting an issue in step 4 'Create the Kubernetes host on OpenStack'
When I execute the curl command, the host appears in kub but it says 'waiting for ssh to be available' and it fails after 60 retries.
I have opened all ports and I am able to ssh to the openstack VM manually.
Pls advise
Rahul Sharma
Can you check if the 'K8S_FLAVOR, PRIVATE_NETWORK_NAME' etc exists on your Openstack? What is the output of the Curl command.
It's also advisable to post the query on the confluence page where you are facing the issue; that way it would help others.
Syed Atif Husain
I have posted my reply on that page, values of variables are correct
Pavan Gupta
Hello,
When I run cd.sh, the config pod isnt coming up. It's shown to be in error state. Does anyone know why this happens? In the kubectl logs, I see the following error 'DEPLOY_DCAE" must be set in onap-parameters.yaml.
Syed Atif Husain
Hey Pavan Gupta
You need to give dcae related params in onap-paramters.yaml file. Otherwise remove dcae component from HELM_APPS in oom/kubernetes/oneclick/setenv.bash if you dont want to install dcae or if your openstack setup is not ready
Refer manual instructions under the section 'quickstart installation'
Pavan Gupta
Hi Syed,
I am using VMware ESXi host to bring up Ubuntu VM. will it work?
Pavan
Michael O'Brien
Pavan,
Just to be sure checked that the yaml was not changed recently
https://git.onap.org/oom/tree/kubernetes/config/onap-parameters-sample.yaml
I won't have time until later today to check - but if the config container complains about a missing DCAE variable - then there is a chance the config yaml is missing it
Did you also source
https://git.onap.org/oom/tree/kubernetes/oneclick/setenv.bash
However currently the CD job is OK with the latest master (are you on Amsterdam by chance?)
http://jenkins.onap.info/job/oom-cd/1388/console
I also installed a clean machine yesterday with no issues - verify your onap-parameters.yaml file against the sample.
these work - just replace your keystone config for your openstack instance for VNFs
/michael
-----Original Message-----
From: Michael O'Brien
Sent: Tuesday, January 23, 2018 07:04
To: 'Pavan Gupta' <pavan.gupta@calsoftinc.com>
Subject: RE: Issues with cd.sh sciprt
Pavan,
Hi, the script mirrors the manual instructions and runs ok on several servers including the automated CD server.
You place the 2 aai files, the onap-configuration.yaml file beside the cd.sh script and run it (this assumes you have run the rancher config ok)
I would need the error conditions pasted to determine if you missed a step - likely during the config pod bootstrap - could you post the errors on the config pod you see.
Also verify all versions and prerequisites, Rancher 1.6.10, helm 2.3.x, docker 1.12.x, Kubernetes 1.8.x
Try to come to the OOM meeting and/or raise a JIRA and we can look at it from there.
DCAE is in flux but there should be no issues with the 2.0.0 tag for the config container
/michael
-----Original Message-----
From: Pavan Gupta [mailto:pavan.gupta@calsoftinc.com]
Sent: Tuesday, January 23, 2018 06:09
To: Michael O'Brien <Frank.Obrien@amdocs.com>
Subject: Issues with cd.sh sciprt
Hi Michael,
I have posted this query on the wiki page as well. I could get the installation script working and moved onto running cd.sh. Config pod is shown in error state. I looked at Kubenetes log and it says DEPLOY_DCAE should be set in snap-parameters.yaml file. I tried setting this parameter, but the error still continues. Any idea, what’s going wrong or needs to be done to resolve this issue?
Pavan
Pavan Gupta
Michael,
My onap-parameters.yaml has been taken from https://git.onap.org/oom/tree/kubernetes/config/onap-parameters-sample.yaml. I am doing the setup on VMware ESXi host. Just wondering how will Openstack parameters will be used in this case. Has anyone setup ONAP on VMware ESXi host?
Pavan
Alain Drolet
Which branch/version are you trying to install?
In my case I'm focussing on `amsterdam`.
For this you need to pick the sample file from the amsterdam branch.
Last time I checked the amsterdam branch and the master branch version were very different.
I used this command to fetch the sample file (and save it under the correct name):
Michael O'Brien
I have setup onap via OOM via Rancher on VMware Workstation 14 and VMware Fusion 8 with no issues
The config in onap-parameters.yaml must point to an openstack user/pass/tenant so that you can create a customer/tenant/region in AAI as part of the vFW use case. You can use any openstack or Rackspace config - you only need keystone to work until you get to SO instantiation.
In the future we will be able to configure Azure or AWS credentials via work being done in the Multicloud repo.
/michael
Andrew Fenner
Success!!!!
Hi, I got to the point of getting a VNF deployed using the kubernates deployment so just wanted to let you know it can work in different environments.
I'm using Rancher and a host VM on a private Red Hat OpenStack.
a couple of local workarounds but and I had to redeploy AAI as it didn't come up first time.
However SDNC didn't work and I had to change it from using the NFS server to using the kubernates volumes as I was getting the error in the nfs-provisioner-.... pod refering to all the ports but I think I have them all open etc.
Why is volume handling for SDNC different to the other namespaces ?
/Andrew
Rahul Sharma
Andrew Fenner, Hi,
Volume handling for SDNC is done differently for 2 reasons:
Not sure why nfs-provisioner isn't starting for you when you have the ports open?
user-acfda
Hi Michael,
List:
Vendor name : MyVendor
License agreement : MyLicenseAgreement
Entitlementpool : MyEntitlementPool
Service : vFW-vSINK-service
VSP : vFW-vSINK
2. After running the init robot testcase, we can notice that only the default services are being listed. The service , which we created using SDC, is not visible in AAI.
3. The curl queries for SDC are not working. We tried many curl queries for the same, to fetch the service name/instance.
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/sdc2/rest/v1/consumers/
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/sdc/v1/catalog/services
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/sdc/v1/catalog/services
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/
https://{serverRoot}/sdc/v1/catalog/{assetType}?{filterKey}={filterValue}
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" http://localhost:30205/sdc/v1/catalog/services/vFW-vSINK-service
curl -X GET -i -H "Accept: application/json; charset=UTF-8" -H "Content-Type: application/json" -H "USER_ID: cs0008" https://127.0.0.1:30205/sdc/v1/registerForDistribution -u"cs0008:demo123456!"
Any help would be appreciated!
Best Regards,
Shubhra
Pavan Gupta
Can we install ONAP on Ubuntu 16.04 VM. Onap-parameters.yaml has 14.04 version mentioned. Will that make any difference in installation?
Pavan
Alain Drolet
Hi Pavan
I did my deployment on Ubuntu 16.04.4 with no issue related to the host OS version.
Michael O'Brien
Pavan, Hi, that ubuntu 14 version is a left over from the original heat parameters - it was used to spin up VM's (the original 1.0 heat install had a mix of 14/16 VMs - don't know why we don't also list the 16 version - you can ignore it as we are only using docker containers in Kubernetes right now.
The only reason we are targeting 16.04 is it is the recommended version of our Kubernetes manager RI (Rancher for now) - you can also use Kubeadm - http://rancher.com/docs/rancher/v1.6/en/installing-rancher/installing-server/#single-container
/michael
Michael O'Brien
Heads up that we can now use Helm 2.6+ - verified 2.7.2, working on 2.8.0 - so that tpl templating can be used 20180124:0800EST master branch.
Openstack, Rackspace, AWS EC2 (pending Azure VM, GCE VM)
current validated config is Rancher 1.6.10+, Helm 2.7.2, Kubernetes 1.8.6, Docker 1.12
In progress - Rancher 1.6.14, Helm 2.8.0, Kubernetes 1.8.6, Docker 17.03.2 - OK on Rackspace and AWS EC2/EBS
Neet to verify 1.9.0
OOM-486 - Getting issue details... STATUS
Pavan Gupta
After the installation, I tried http://10.22.4.112:30211 on the browser and the ONAP portal didn't open up. Not all services are shown 1/1 (please check the output below)
kubectl get pods --all-namespaces
onap-vfc vfc-ztevnfmdriver-726786078-jc7b4 0/1 ImagePullBackOff 0 12h
onap-aaf aaf-1993711932-h3q31 0/1 Running 0 12h
I am not sure, why can't I see the onap portal now.
FOllowing is the error msg on Kubernetes. Its not able to pull the container image.
Failed to pull image "nexus3.onap.org:10001/onap/vfc/ztevnfmdriver:v1.0.2": rpc error: code = 2 desc = Error: image onap/vfc/ztevnfmdriver:v1.0.2 not found
Error syncing pod
user-acfda
Pavan Gupta
Check for oom/kubernetes/portal/values.yaml file in the respective ONAP component ( sayvfc or portal or MSO etc ) and look for the prepull policy option.
Set it to Always.
Then do a docker pull for the respective image.
Best Regards,
Shubhra
Marcus Williams
I'm seeing kube2msb pod failing to come up when deploying oom using './cd.sh -b amsterdam' :
onap-kube2msb kube2msb-registrator-1382931887-565pz 0/1 CrashLoopBackOff 8 17m
kube2msb logs:
1/25/2018 11:06:59 AM2018-01-25 19:06:59.777976 I | Using https://kubernetes.default.svc.cluster.local:443 for kubernetes master
1/25/2018 11:06:59 AM2018-01-25 19:06:59.805097 I | Could not connect to Kube Masterthe server has asked for the client to provide credentials
Has anyone seen this issue or know how to solve it?
Rancher v1.6.10
kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.7-rancher1", GitCommit:"a1ea37c6f6d21f315a07631b17b9537881e1986a", GitTreeState:"clean", BuildDate:"2017-10-02T21:33:08Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
helm version
Client: &version.Version{SemVer:"v2.3.0", GitCommit:"d83c245fc324117885ed83afc90ac74afed271b4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.3.0", GitCommit:"d83c245fc324117885ed83afc90ac74afed271b4", GitTreeState:"clean"}
docker 1.12.6
Alexis de Talhouët
I guess this would be on Amsterdam. You need to update the kube2msb deployment file with your K8S token. In Rancher, under your environment, go in Kubernetes → CLI → Generate Config this should gives you your token to authenticate to K8S API for your deployment.
Marcus Williams
Thanks Alexis - I tried exactly what you suggested but it still wasn't working (thus the above post).
It is now working. I did two things and I'm not sure which fixed the issue:
user-acfda
Hi Marcus Williams
In the below threads, check my detailed response for CrashLoopBackOff error state.
It worked for us and has resolved the issue.
In short, Put back the backup of the dockerdata-nfs folder and then do a cleaned delete of ONAP pods.
Then delete the dockerdaat-nfs folder and bring fresh ONAP pods.
check below response.
ravi rao
After completing the ./createAll.bash -n onap I see every pods up and running except for
onap-portal vnc-portal-845d84676c-jcdmp 0/1 CrashLoopBackOff 17 1h
Logs Indicate that x11vnc exited:
stored passwd in file: /.password2
/usr/lib/python2.7/dist-packages/supervisor/options.py:297: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2018-01-25 21:47:52,310 CRIT Supervisor running as root (no user in config file)
2018-01-25 21:47:52,310 WARN Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing
2018-01-25 21:47:52,354 INFO RPC interface 'supervisor' initialized
2018-01-25 21:47:52,357 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-01-25 21:47:52,357 INFO supervisord started with pid 44
2018-01-25 21:47:53,361 INFO spawned: 'xvfb' with pid 51
2018-01-25 21:47:53,363 INFO spawned: 'pcmanfm' with pid 52
2018-01-25 21:47:53,365 INFO spawned: 'lxpanel' with pid 53
2018-01-25 21:47:53,368 INFO spawned: 'lxsession' with pid 54
2018-01-25 21:47:53,371 INFO spawned: 'x11vnc' with pid 55
2018-01-25 21:47:53,373 INFO spawned: 'novnc' with pid 56
2018-01-25 21:47:53,406 INFO exited: x11vnc (exit status 1; not expected)
2018-01-25 21:47:54,681 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:54,681 INFO success: pcmanfm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:54,681 INFO success: lxpanel entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:54,681 INFO success: lxsession entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:54,683 INFO spawned: 'x11vnc' with pid 68
2018-01-25 21:47:54,683 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-25 21:47:56,638 INFO success: x11vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Has anyone seen this problem ??
Regards,
Ravi
Winnie Tsang (IBM)
Hi Ravi,
I encounter the same problem too. Did you find a solution or workaround for this issue yet?
Best Regards,
Winnie
user-acfda
Hi Winnie Tsang (IBM) ravi rao
The ONAP system/pods enter into the CrashLoopBackOff state, only when you delete the dockerdata-nfs for the respective ONAP component.
rm
rf /dockerdatanfs/portal has been deleted. Now, ONAP system has noways of knowing - which data to delete, so there are uncleaned/dangling links.Solution :
For the vnc-portal, I have faced the similar issue today:
kubectl describe po/<container-for-vnc-portal>
n onapportalNote : the respective image will be missing.
docker pull <image name>
Kubernetes will pick the newly pulled docker image. The issue for vnc-portal will be resolved.
Best Regards,
Shubhra
ravi rao
Hi Shubhra,
Thanks for the detailed steps. I did pull all the docker images that portal app is depends on and I still see the same error
onap-portal vnc-portal-56c8b774fb-wvv2d 0/1 PostStartHookError 4 1m
Main issue is, with this error I cannot get to the portal-vnc and hence cannot access the portal UI. Any help is greatly appreciated..
Regards,
Ravi
Michael O'Brien
Guys, it helps if you post your versions (onap branch, helm version, kubernetes version, rancher version, docker version), whether your config container ran ok 0/1 completed and that you have all dependent containers up (for example vnc-portal needs vid to start)
common issue is helm related (helm 2.5+ running on amsterdam - stick to 2.3 on that branch)
for example only master works with helm 2.5+
OOM-441 - Getting issue details... STATUS
ravi rao
Hi Michael,
Below are details in my env..
ubuntu@onap-rancher-vm:~/oom/kubernetes/oneclick$ helm version
Client: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitTreeState:"clean"}\
When you say helm 2.5+ are you referring to server version or client ? I only installed helm client v2.1.3 and I think rancher installs the helm server.
onap I am using is amsterdam
All the pods are up and running except for vnc-portal container in onap-portal namespace and elasticsearch container in onap-log
ubuntu@onap-rancher-vm:~/oom/kubernetes/oneclick$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-76b8cd7b5-zjk6h 1/1 Running 0 2d
kube-system kube-dns-5d7b4487c9-6srpr 3/3 Running 0 2d
kube-system kubernetes-dashboard-f9577fffd-sjmst 1/1 Running 0 2d
kube-system monitoring-grafana-997796fcf-k5hfs 1/1 Running 0 2d
kube-system monitoring-influxdb-56fdcd96b-sm5vt 1/1 Running 0 2d
kube-system tiller-deploy-cc96d4f6b-gjm6p 1/1 Running 0 2d
onap-aaf aaf-849d477595-rxhfk 0/1 Running 0 1d
onap-aaf aaf-cs-6f989ff9cb-g9xrg 1/1 Running 0 1d
onap-aai aai-resources-64cc9b6757-wjq7v 2/2 Running 0 1d
onap-aai aai-service-8cd946dbf-mxt9l 1/1 Running 5 1d
onap-aai aai-traversal-984d55b6d-75dst 2/2 Running 0 1d
onap-aai data-router-df8bffd44-lfnv8 1/1 Running 0 1d
onap-aai elasticsearch-6b577bf757-rpdqn 1/1 Running 0 1d
onap-aai hbase-794b5b644d-gdsh9 1/1 Running 0 1d
onap-aai model-loader-service-6684c846db-g9hsl 2/2 Running 0 1d
onap-aai search-data-service-77bdb5f849-hjn56 2/2 Running 0 1d
onap-aai sparky-be-69d5667b5f-k6tck 2/2 Running 0 1d
onap-appc appc-86cc48f4c4-q8xgw 2/2 Running 0 1d
onap-appc appc-dbhost-7bd58565d9-fqrvs 1/1 Running 0 1d
onap-appc appc-dgbuilder-78746d5b75-t8988 1/1 Running 0 1d
onap-clamp clamp-5fdf8b7d5f-2mckp 1/1 Running 0 1d
onap-clamp clamp-mariadb-64dd848468-snmmh 1/1 Running 0 1d
onap-cli cli-6885486887-hcvgj 1/1 Running 0 1d
onap-consul consul-agent-5c744c8758-8spjs 1/1 Running 1 1d
onap-consul consul-server-687f6f6556-cz78t 1/1 Running 2 1d
onap-consul consul-server-687f6f6556-vl7lj 1/1 Running 2 1d
onap-consul consul-server-687f6f6556-xb8kt 1/1 Running 1 1d
onap-dcaegen2 heat-bootstrap-6b8db64547-gzcnd 1/1 Running 0 1d
onap-dcaegen2 nginx-7ddc7ffc78-lvt7s 1/1 Running 0 1d
onap-esr esr-esrgui-68cdbd94f5-x26vg 1/1 Running 0 1d
onap-esr esr-esrserver-7fd9c6b6fc-8dwnd 1/1 Running 0 1d
onap-kube2msb kube2msb-registrator-8668c8f5b9-qd795 1/1 Running 0 1d
onap-log elasticsearch-6df4f65775-9b45s 0/1 CrashLoopBackOff 539 1d
onap-log kibana-846489d66d-98fz8 1/1 Running 0 1d
onap-log logstash-68f8d87968-9xc5c 1/1 Running 0 1d
onap-message-router dmaap-59f79b8b6-kx9kj 1/1 Running 1 1d
onap-message-router global-kafka-7bd76d957b-bpf7l 1/1 Running 1 1d
onap-message-router zookeeper-7df6479654-psf7b 1/1 Running 0 1d
onap-msb msb-consul-6c79b86c79-9krm9 1/1 Running 0 1d
onap-msb msb-discovery-845db56dc5-zq849 1/1 Running 0 1d
onap-msb msb-eag-65bd96b98-vbtrx 1/1 Running 0 1d
onap-msb msb-iag-7bb5b74cd9-5bx4m 1/1 Running 0 1d
onap-mso mariadb-5879646dd5-mb98c 1/1 Running 0 1d
onap-mso mso-7bfc5cf78c-28llb 2/2 Running 0 1d
onap-multicloud framework-6877c6f4d-xv6rm 1/1 Running 0 1d
onap-multicloud multicloud-ocata-5c955bcc96-6qjhz 1/1 Running 0 1d
onap-multicloud multicloud-vio-5bccd9fdd7-qcjzq 1/1 Running 0 1d
onap-multicloud multicloud-windriver-5d9bd7ff5-n7grp 1/1 Running 0 1d
onap-policy brmsgw-dc766bd4f-9mrgf 1/1 Running 0 1d
onap-policy drools-59d8499d7d-jck5l 2/2 Running 0 1d
onap-policy mariadb-56ffbf5bcf-hf9f5 1/1 Running 0 1d
onap-policy nexus-c89ccd7fc-n4g9j 1/1 Running 0 1d
onap-policy pap-586bd544d7-gxtdj 2/2 Running 0 1d
onap-policy pdp-78b8cbf8b4-fh2hf 2/2 Running 0 1d
onap-portal portalapps-7c488c4c84-8x4t9 2/2 Running 0 1h
onap-portal portaldb-7f8547d599-hwcp7 1/1 Running 0 1h
onap-portal portalwidgets-799dfd79f6-5q85k 1/1 Running 0 1h
onap-portal vnc-portal-56c8b774fb-dl46s 0/1 CrashLoopBackOff 20 1h
onap-robot robot-959b68c94-7n9kh 1/1 Running 0 1d
onap-sdc sdc-be-6bf4f5d744-xk5l6 2/2 Running 0 1d
onap-sdc sdc-cs-6bfc44d4fc-s5nnz 1/1 Running 0 1d
onap-sdc sdc-es-69f77b4778-th98q 1/1 Running 0 1d
onap-sdc sdc-fe-84646b4bff-fczlr 2/2 Running 0 1d
onap-sdc sdc-kb-5468f987d9-5wklh 1/1 Running 0 1d
onap-sdnc dmaap-listener-5956b4c8dc-9c4wm 1/1 Running 0 1d
onap-sdnc sdnc-968d56bcc-6q24c 2/2 Running 0 1d
onap-sdnc sdnc-dbhost-7446545c76-lkhj6 1/1 Running 0 1d
onap-sdnc sdnc-dgbuilder-55696ffff8-6mtqh 1/1 Running 0 1d
onap-sdnc sdnc-portal-6dbcd7c948-tqtj9 1/1 Running 0 1d
onap-sdnc ueb-listener-66dc757b5-f4r6m 1/1 Running 0 1d
onap-uui uui-578cd988b6-m7v72 1/1 Running 0 1d
onap-uui uui-server-576998685c-sb6kk 1/1 Running 0 1d
onap-vfc vfc-catalog-6ff7b74b68-6j4q8 1/1 Running 0 1d
onap-vfc vfc-emsdriver-7845c8f9f-w2vgf 1/1 Running 0 1d
onap-vfc vfc-gvnfmdriver-56cf469b46-wsg4r 1/1 Running 0 1d
onap-vfc vfc-hwvnfmdriver-588d5b679f-zpcj6 1/1 Running 0 1d
onap-vfc vfc-jujudriver-6db77bfdd5-qz4fk 1/1 Running 0 1d
onap-vfc vfc-nokiavnfmdriver-6c78675f8d-4k5mx 1/1 Running 0 1d
onap-vfc vfc-nslcm-796b678d-nvvfd 1/1 Running 0 1d
onap-vfc vfc-resmgr-74f858b688-shkzw 1/1 Running 0 1d
onap-vfc vfc-vnflcm-5849759444-fcrft 1/1 Running 0 1d
onap-vfc vfc-vnfmgr-77df547c78-lwp97 1/1 Running 0 1d
onap-vfc vfc-vnfres-5bddd7fc68-s6spr 1/1 Running 0 1d
onap-vfc vfc-workflow-5849854569-sd249 1/1 Running 0 1d
onap-vfc vfc-workflowengineactiviti-699f669db9-s99n8 1/1 Running 0 1d
onap-vfc vfc-ztesdncdriver-5dcf694c4-fsdf2 1/1 Running 0 1d
onap-vfc vfc-ztevmanagerdriver-6c8d776f5c-68spg 1/1 Running 0 1d
onap-vid vid-mariadb-575fd8f48-x95t6 1/1 Running 0 1d
onap-vid vid-server-6cdf654d86-x72lc 2/2 Running 0 1d
onap-vnfsdk postgres-5679d856cf-gz5d4 1/1 Running 0 1d
onap-vnfsdk refrepo-7d9665bd47-cv6h5 1/1 Running 0 1d
ravi rao
onap branch - Amsterdam
helm versions
Client: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.1", GitCommit:"bbc1f71dc03afc5f00c6ac84b9308f8ecb4f39ac", GitTreeState:"clean"}\
Kubernetes version
ubuntu@onap-rancher-vm:~/oom/kubernetes/oneclick$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.5-rancher1", GitCommit:"6cb179822b9f77893eac5612c91a0ed7c0941b45", GitTreeState:"clean", BuildDate:"2017-12-11T17:40:37Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
rancher version - 1.6.14
Docker version on rancher VM - 17.03.0-ce
Docker version on Kubernetes VM - 1.12.6
Regards,
Ravi
Michael O'Brien
Hi your versions are mixed across your rancher/kubernetes server - try collocating first until your system is up.
see the top of the page
You need helm 2.6.1 in order to run the tpl templates in the yamls in master
You need to run an older version of helm 2.3.1 in amsterdam so that vnc-portal will startup
Release
Kubernetes
Helm
Kubectl
Docker
Kumar Lakshman Kumar
Hi Ravi,
Did you got vnc-portal container in onap-portal namespace and elasticsearch container in onap-log working.
I resolved my elasticsearch container issue by increasing sudo sysctl -w vm.max_map_count=262144 on pod its running.
but still no luck with vnc-portal in log I see the x11vnc process keep getting restarted not sure how to fix this issue.
ravi rao
Hi Kumar,
I followed the instructions specified in the below post by kranthi to solve the problem.
NOTE: Main reason for this issue is I did not have the recommended versions of helm/rancher & kubernetes. It was not so easy to align the versions so tried the below suggested fix and it worked for me. You can also try it and see if it solves your issue.
Regards,
Ravi..
kranthi guttikonda
I had the same problem with Amsterdam branch. Master branch has fixes to resolve this. Basically the helm chart they defined lifecycle PostStart which may run before starting container itself (Its not guaranteed). So, please take the portal folder from master branch and replace in Amsterdam or just replace resources folder in side portal (from master) and also portal-vnc-dep.yaml file inside template from master to Amsterdam
helm delete --purge onap-portal
cd oom/kubernetes
helm install --name onap-portal ./portal
Michael O'Brien
elasticsearch memory issue - fixed
OOM-511 - Getting issue details... STATUS
Portal issue (helm related) - fixed
OOM-486 - Getting issue details... STATUS
OOM-441 - Getting issue details... STATUS
Guys, follow or use as a reference the scripts below - it will create a rancher environment and install onap on either amsterdam or master (use your own onap-parameters.yaml)
entrypoint
OOM-710 - Getting issue details... STATUS
rancher install
OOM-715 - Getting issue details... STATUS
onap install
OOM-716 - Getting issue details... STATUS
user-acfda
See the below response.
Santosh Thapa Magar
Hi Michael O'Brien,
Sorry for the trouble.
I am a beginner to ONAP.
I wanted to install ONAP on AWS environment.
But as I went through your video I found I need onap_paramaters.yaml file which includes the Openstack credentials.
Do I need this for installing ONAP on AWS environment.
I want to install Onap on AWS instance only.
Is it optional or I must have Openstack Credentials
Please help.
Best Regards,
Santosh
Michael O'Brien
Santosh,
Hi, no, you can put fake user/pass/token strings there for now. When you get the point of running the use cases - like the vFW and need to create a customer/tenent/region on AAI - this is where real credentials will be required to authenticate to Keystone. Later when you orchestrate VNFs via SO - full functionality will be required.
For now use the sample one in the repo.
let us know how things work out. And don't hesitate to ask questions about AWS in your case when bringing up the system.
thank you /michael
Santosh Thapa Magar
Hi Michael O'Brien
Thank you very much for the reply.
I will use the sample you provided.
Before I start installing Onap, can you please help me understand the need of domain name for installation.
Can't I use Elastic IP only?
And about the use case, Can you let me know which use cases will work under this Installation of ONAP on Kubernetes with out having Openstack Credentials.
Thanks a lot.
Best Regards,
Santosh
Michael O'Brien
You can use a routable IP or a domain - I use a domain so I don't have to remember the IP's of my servers
Santosh Thapa Magar
Hi Michael O'Brien
Sorry to bother you.
I started to install ONAP as per you guidance in AWS, I was able to install rancher docker and helm.
But when I hit cd.sh i get the following errors.
Can you have a look at it and suggest me the solution.
PS: I have use the same onap-paramater.yaml provided in this repo.
********************************************************************************************************
root@ip-10-0-1-113:~# ./cd.sh -b release-1.1.0
Wed Jan 31 06:53:31 UTC 2018
provide onap-parameters.yaml and aai-cloud-region-put.json
vm.max_map_count = 262144
remove existing oom
./cd.sh: line 20: oom/kubernetes/oneclick/setenv.bash: No such file or directory
./cd.sh: line 22: oom/kubernetes/oneclick/deleteAll.bash: No such file or directory
Error: incompatible versions client[v2.8.0] server[v2.6.1]
sleeping 1 min
deleting /dockerdata-nfs
chmod: cannot access '/dockerdata-nfs/onap': No such file or directory
pull new oom
Cloning into 'oom'...
fatal: Remote branch release-1.1.0 not found in upstream origin
start config pod
./cd.sh: line 43: oom/kubernetes/oneclick/setenv.bash: No such file or directory
moving onap-parameters.yaml to oom/kubernetes/config
cp: cannot create regular file 'oom/kubernetes/config': No such file or directory
./cd.sh: line 47: cd: oom/kubernetes/config: No such file or directory
./cd.sh: line 48: ./createConfig.sh: No such file or directory
verify onap-config is 0/1 not 1/1 - as in completed - an error pod - means you are missing onap-parameters.yaml or values are not set in it.
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete
No resources found.
waiting for config pod to complete....
************************************************************************************************
Michael O'Brien
Santosh,
H, there are 2 errors above
fatal: Remote branch release-1.1.0 not found in upstream origin
release-1.1.0 was deleted a month ago - yes I had a comment in my cd.sh script as an example for master or that release - I will update the comment to print "amsterdam" - so there is no confusion
for reference here are the active branches
https://gerrit.onap.org/r/#/admin/projects/oom,branches
the rest of the errors are because the git clone did not work - no files
./cd.sh: line 47: cd: oom/kubernetes/config: No such file or directory
./cd.sh: line 48: ./createConfig.sh: No such file or directory
do the following and you will be ok
./cd.sh -b master
or
./cd.sh -b amsterdam
/michael
Pavan Gupta
Hello,
Any help is appreciated. If required, we can do a remote desktop session.
cd script output.rtfkubectl get pods.rtf
Michael O'Brien
Pavan,
Check your cd script output.rtf - you are not running the correct helm version (likely you are running 2.3 - should be running 2.6+ - ideally 2.8.0)
For the vnf image pull - have not looked at this - verify the right tag is being pulled from nexus3 and close off the JIRA if you find it.
If you look at your logs - you will see you have the right # of non-running containers (2) but you will notice that some of your createAll calls are failing on the new template tpl code added last week (yes the author of that change should have notified the community of the pending change - I picked up the comm task later that day).
like the following
Error: parse error in "appc/templates/appc-conf-configmap.yaml": template: appc/templates/appc-conf-configmap.yaml:8: function "tpl" not defined
The command helm returned with error code 1
Check this page for the right version - it changed on Wed.
I sent out this notice as well to the onap-discuss newsgroup
https://lists.onap.org/pipermail/onap-discuss/2018-January/007674.html
for
OOM-552 - Getting issue details... STATUS
https://gerrit.onap.org/r/#/c/28291/
thank you
/michael
Andrew Fenner
Hi,
I got a closed loop UC running with OOM deployment. I used the workaround for DCAE/VES as outlined in " DCAE mS Deployment (Standalone instantiation) ".
I've attached the helm files I made for this workaround if you just expand them into ..../oom/kubernates you should get a directory called ves and then you can just go ../oneclick/createall.sh -n onap -a ves
/Andrewves-oom.tar
Bharath Thiruveedula
Hi Andrew Fenner, it's nice to see that it works for you. I have OOM setup with out DCAE. Now I can download the ves-oom.tar and create the pod? How can I make other components point to this standalone DCAE model? we have to change vFWCL.zip to give DCAE collector ip and port right? Can you give more details on Closed Loop end?
Andrew Fenner
The file is attached in the last post. The VES and CDAP are intergrated into the rest of the other components by the k8s dns. The way to expose the VES port is using
kubectl expose services ves-vesserver --type=LoadBalancer --port 8080 --target-port=8080 --name=vesfrontend -n onap-ves
I should work out how to add this to the helm templates.
/Andrew
Bharath Thiruveedula
Sure Andrew. And one more question, are you creating "/ves/DmaapConfig.json" file. I couldn't find it in the tar. Am I missing something here?
Andrew Fenner
Sorry. I missed explaining that step.
I created a file /dockerdata-nfs/onap/ves/DmaapConfig.json
with the content below.
This overrides the default and means you can update the location of the dmaap host. Whats below should work if you have the default namespace names
{
"channels": [
{
"name": "sec_measurement",
"cambria.topic": "unauthenticated.SEC_MEASUREMENT_OUTPUT",
"class": "HpCambriaOutputStream",
"stripHpId": "true",
"type": "out",
"cambria.hosts": "dmaap.onap-message-router"
},
{
"name": "sec_fault",
"cambria.topic": "unauthenticated.SEC_FAULT_OUTPUT",
"class": "HpCambriaOutputStream",
"stripHpId": "true",
"type": "out",
"cambria.hosts": "dmaap.onap-message-router"
}
]
}
user-acfda
Location in onap, where we need to upload/untar the ves-oom.tar file?
Andrew Fenner
Hi,
I didn't use the vFWCL.zip as I got a different type of closed loop running for an internal VNF.
The files go in
.../oom/kubernates
i.e. along side the files for all the other namespaces.
You still have to load the TCA application into the CDAP server in much the same way as in the referenced workaround page.
/Andrew
user-acfda
When we were doing SDNC preload operation, for SINK and PG, we noticed for the modified json files for SINK ( our values of VNF details and service instance etc), the existing/predefined VFWCL instance got changed? Was it correct?
user-acfda
Hi All,
We are facing the error (Init:ErrImageNeverPull )for all the ONAP components. Can anybody help - how to rectify the error?
Michael O'Brien
Shubra,
Image pull errors usually mean you cannot reach nexus3.onap.org - especially that many - which could be your proxy (switch to a cell connection to verify).
Do a manual docker pull to check this.
Another reason could be you did not source setenv.bach where the docker repo credentials/url are set
Verify that these are set
user-acfda
Thank you so much Michael, source the setenv.bash file, resolved the issue for most of the ONAP component.
But, for some of the components like vnc-portal, elasticsearch , pap-policy etc were still showing the same error.
Doing a manual pull of image resolved the issue, but for elasticsearch the issue still persists.
For elasticsearch, the below is the system state, I do have the docker image for it in the system but still it is having the error - ImageNeverPull.
Though, for policy-pe , policy-drools , I have pulled the latest docker images manually.
But, for brmsgw ( onap-policy ), which image to pull, I have no idea ??
Can you suggest something.
onap-policy brmsgw-2679573537-z6pp8 0/1 Init:0/1 36 6h
onap-policy drools-1375106353-1q1ch 0/2 Init:0/1 36 6h
Also, do I need to run the command "docker run image-name" , after pulling the images ? Where does the latest pulled images go?
I have pulled in the image for vnc-portal. But , now the system is NOT showing docker image for the same. What went wrong?
I did a docker pull for the below images but it is not listing in docker images.
Conclusion:
We have resolved the above mentioned issue by pulling in the docker images , which were missing from the system for the respective components.
Michael O'Brien
Shubhra,
Remember this is Kubernetes not Docker. Kubernetes is a layer on top of Docker - you don't need to run any docker commands except when installing the Rancher wrapper on Kubernetes - after that always use kubectl
Follow the instructions on this wiki "exactly" or use the scripts for your first time install
Pulling docker images yourself is not required - the only reason for the prepull is to speed up the onap startup - for example running the createAll a second time will run faster since the images were pulled earlier.
The images that the values.yaml (s) files pull are the ones pulled automatically by Kubernetes - you don't need later versions unless there are app fixes we have not switched to yet.
If you are having issues with docker pulls then it is in your system behind your firewall - I can't remember if it was you (I answer a lot of support questions here) - did you do a proper source of setenv.sh and also make sure your config pod is OK.
If you really want to see ONAP work usually OK - just to verify your procedure - run it on a VM in public cloud like AWS or Azure and apply that to your local environment. I am thinking that there may be an issue pulling from nexus3 - I have seen this in other corp environments.
/michael
user-f6250
Hi All,
I follow instruction above to run ONAP on Kubernetes, where the server and client are co-located.
I have two issues regarding the implementation:
demo-aai model-loader-service-5c9d84589b-6pz5q 2/2 Running 0 4h
demo-aai search-data-service-6fc58fd7cc-qhzdc 2/2 Running 0 4h
demo-aai sparky-be-684d6759bc-jl5wx 2/2 Running 0 4h
demo-mso mso-6c4dd64bf9-nhdjs 2/2 Running 2 4h
demo-sdnc sdnc-dbhost-0 2/2 Running 1 4h
demo-vid vid-server-56d895b8c-2nctp 2/2 Running 0 3h
2. in the next step, i just followed the VNC-portal through the Video but the pod portal is not available there too. In principle, i tried to add the portal but an error is comes up that "the portal is already exist". in addition i looking for the ete-k8s.sh file in the dockerdata-nfs but there is no any files except eteshare and robot!
Can any one help me to fix these two issues?
Rahul Sharma
user-f6250, Hi,
For 1. Yes,policy and portal should come in the above 'kubectl' result. I would recommend checking your setenv.bash under $HOME/oom/kubernetes/oneclick and check which HELM_APPS you are deploying. Make sure it has policy and portal in there.
For 2. ete-k8s.sh is present under $HOME/oom/kubernetes/robot, not under dockerdata-nfs. eteshare under dockerdata-nfs/onap/robot would contain the logs of the run when you execute ete-k8s.sh.
user-f6250
Hi Rahul Sharma
Regarding to first issue: Policy and Portal are there.
Regarding to the second issue: i just followed instruction of the VNC-portal. The video shows that ete-k8s.sh must appear in the dockerdata-nfs when running ./createAll.bash -n demo
because of the portal, i can not check AAI endpoints and run health check!
Any idea?
Rahul Sharma
user-f6250:
user-f6250
Rahul Sharma
I think mistakenly i have created to instances. One based on instruction provided in ONAP on Kubernetes (onap) and the second one based on vnc-portal instruction (demo). Should i delete one of the instances, for example demo? if yes please tell me what command i should use!
if i delete one instance, Does it effect on the other one?
when i ran kubectl get pods -n onap-portal for onap i receive following messages:
root@omap:~/oom/kubernetes/robot# kubectl get pods -n onap-portal
NAME READY STATUS RESTARTS AGE
portalapps-dd4f99c9b-lbm7w 0/2 Init:Error 0 24m
portaldb-7f8547d599-f2wlv 0/1 CrashLoopBackOff 5 24m
portalwidgets-6f884fd4b4-wl84p 0/1 Init:Error 0 24m
vnc-portal-687cdf7845-clqth 0/1 Init:0/4 1 24m
But for demo is:
root@omap:~/oom/kubernetes/robot# kubectl get pods -n demo-portal
No resources found.
in other case, when i run the health check (as you mentioned), i receive the following message:
root@omap:~/oom/kubernetes/robot# ./ete-k8s.sh health
No resources found.
error: expected 'exec POD_NAME COMMAND [ARG1] [ARG2] ... [ARGN]'.
POD_NAME and COMMAND are required arguments for the exec command
See 'kubectl exec -h' for help and examples.
Thanks for your kind!
Rahul Sharma
I am not sure about the demo-portal. But yes, if the ports are already being used, there would be conflicts when launching similar pod again.
I would recommend clearing up and starting afresh.
Here is what I would do:
user-f6250
Rahul Sharma
I trying to delete the containers but i faced the following error!
Should i first evacuate host in theRancher or leave it as it is?
root@omap:~/oom/kubernetes/oneclick# ./deleteAll.bash -n demo -y
********** Cleaning up ONAP:
release "demo-consul" deleted
namespace "demo-consul" deleted
clusterrolebinding "demo-consul-admin-binding" deleted
Service account demo-consul-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-msb" deleted
clusterrolebinding "demo-msb-admin-binding" deleted
Service account demo-msb-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-mso" deleted
clusterrolebinding "demo-mso-admin-binding" deleted
Service account demo-mso-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-message-router" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-message-router-admin-binding" not found
Service account demo-message-router-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-sdnc" deleted
clusterrolebinding "demo-sdnc-admin-binding" deleted
Service account demo-sdnc-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-vid" deleted
clusterrolebinding "demo-vid-admin-binding" deleted
Service account demo-vid-admin-binding deleted.
release "demo-robot" deleted
namespace "demo-robot" deleted
clusterrolebinding "demo-robot-admin-binding" deleted
Service account demo-robot-admin-binding deleted.
E0201 09:24:42.090532 5895 portforward.go:331] an error occurred forwarding 32898 -> 44134: error forwarding port 44134 to pod 9b031662eac045462b5e018cc6829467a799568021c3a97dfe8d7ec6272e1064, uid : exit status 1: 2018/02/01 09:24:42 socat[7805] E connect(6, AF=2 127.0.0.1:44134, 16): Connection refused
Error: transport is closing
namespace "demo-portal" deleted
clusterrolebinding "demo-portal-admin-binding" deleted
Service account demo-portal-admin-binding deleted.
Error: release: "demo-policy" not found
namespace "demo-policy" deleted
clusterrolebinding "demo-policy-admin-binding" deleted
Service account demo-policy-admin-binding deleted.
Error: release: "demo-appc" not found
Error from server (NotFound): namespaces "demo-appc" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-appc-admin-binding" not found
Service account demo-appc-admin-binding deleted.
release "demo-aai" deleted
namespace "demo-aai" deleted
clusterrolebinding "demo-aai-admin-binding" deleted
Service account demo-aai-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo-sdc" deleted
clusterrolebinding "demo-sdc-admin-binding" deleted
Service account demo-sdc-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-dcaegen2" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-dcaegen2-admin-binding" not found
Service account demo-dcaegen2-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-log" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-log-admin-binding" not found
Service account demo-log-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-cli" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-cli-admin-binding" not found
Service account demo-cli-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-multicloud" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-multicloud-admin-binding" not found
Service account demo-multicloud-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-clamp" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-clamp-admin-binding" not found
Service account demo-clamp-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-vnfsdk" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-vnfsdk-admin-binding" not found
Service account demo-vnfsdk-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-uui" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-uui-admin-binding" not found
Service account demo-uui-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-aaf" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-aaf-admin-binding" not found
Service account demo-aaf-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-vfc" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-vfc-admin-binding" not found
Service account demo-vfc-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-kube2msb" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-kube2msb-admin-binding" not found
Service account demo-kube2msb-admin-binding deleted.
Error: could not find a ready tiller pod
Error from server (NotFound): namespaces "demo-esr" not found
Error from server (NotFound): clusterrolebindings.rbac.authorization.k8s.io "demo-esr-admin-binding" not found
Service account demo-esr-admin-binding deleted.
Error: could not find a ready tiller pod
namespace "demo" deleted
Waiting for namespaces termination...
Apart of that i try to delete the demo and onap but i am not succeed again.
Here is the error for the second command (./deleteAll.bash -n onap):
root@omap:~/oom/kubernetes/oneclick# ./deleteAll.bash -n demo
Current kubectl context does not match context specified: ONAP
You are about to delete deployment from: ONAP
To continue enter context name: demo
Your response does not match current context! Skipping delete ...
root@omap:~/oom/kubernetes/oneclick#
Michael O'Brien
Some of the earlier errors are normal - I have seen these on half-deployed systems
if the following shows pods still up (except the 6 for kubernetes) even after a helm delete --purge - then you could also start from scratch - delete all of your kubernetes and rancher docker containers
kubectl get pods --all-namespaces -a
Docker DevOps#Dockercleanup
also try to follow the tutorial here "exactly" if this is your first time running onap - or use the included scripts - you won't have any issues that way.
Also just to be safe - because there may be some hardcoding of "onap" - it was hardcoded in places under helm 2.3 because we could not use the tpl template until 2.6 (we only upgraded to 2.8 last week)
user-f6250
Michael O'Brien
I am totally new on ONAP. I exactly followed as the tutorial, but once i tried to add vnc-portal, the errors are come up. Because in instruction of the vnc-portal mentioned that need to create a demo for the portal which make a conflict with the onap (it seems that running two instances are complicated!)
As you suggested i deleted the Pods, but one of them still is in terminating state, should i ignore that or i should start from scratch?
root@omap:~/oom/kubernetes/oneclick# kubectl get pods --all-namespaces -a
NAMESPACE NAME READY STATUS RESTARTS AGE
demo-sdnc sdnc-dbhost-0 0/2 Terminating 1 2d
kube-system heapster-76b8cd7b5-z99xr 1/1 Running 0 3d
kube-system kube-dns-5d7b4487c9-zc5tx 3/3 Running 735 3d
kube-system kubernetes-dashboard-f9577fffd-c8bgs 1/1 Running 0 3d
kube-system monitoring-grafana-997796fcf-mgqd9 1/1 Running 0 3d
kube-system monitoring-influxdb-56fdcd96b-pnbrj 1/1 Running 0 3d
kube-system tiller-deploy-74f6f6c747-7cvth 1/1 Running 373 3d
Michael O'Brien
Eveything is normal except for the failed SDNC container deletion - I have seen this on another system - 2 days ago - something went into master for SDNC that caused this - for that particular machine deleted the VM and raised a new spot VM - a helm delete --purge had no effect - even killing the docker outside of kubernetes had no effect - I had notes on this and will raise a JIRA - the next system I raised for the CD jobs dis not have the issue anymore.
OOM-653 - Getting issue details... STATUS
http://jenkins.onap.info/job/oom-cd/1594/console
Essentially this will block any future deployment
demo-sdnc sdnc-dbhost-0 0/2 Terminating 1 2d
user-f6250
Michael O'Brien, Rahul Sharma
As Michael suggested i started from scratch. I received following error when execute the following command on each host:
root@omap:~# sudo docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.9 http://rackspace.onap.info:8880/v1/scripts/CDE31E5CDE3217328B2D:1514678400000:xLr2ySIppAaEZYWtTVa5V9ZGc
INFO: Running Agent Registration Process, CATTLE_URL=http://rackspace.onap.info:8880/v1
INFO: Attempting to connect to: http://rackspace.onap.info:8880/v1
ERROR: http://rackspace.onap.info:8880/v1 is not accessible (Failed to connect to rackspace.onap.info port 8880: Connection refused)
ERROR: http://rackspace.onap.info:8880/v1 is not accessible (Failed to connect to rackspace.onap.info port 8880: Connection refused)
ERROR: http://rackspace.onap.info:8880/v1 is not accessible (Failed to connect to rackspace.onap.info port 8880: Connection refused)
ERROR: http://rackspace.onap.info:8880/v1 is not accessible (Failed to connect to rackspace.onap.info port 8880: Connection refused)
can you help me what i have to do and what is the reason for this. (please keep in mind that i am running server and client in a same VM machine)
Michael O'Brien
Hamzeh,
Hi, the DNS name rackspace.onap.info is my own domain - it is just an example - I use a domain to avoid an IP address. In your case you will need to use the IP address of your VM to launch the UI and register a host and not the IP/DNS of my host.
If I still had that system up - then you would have actually registered your host to one of my own OOM deployments and my pods would have started appearing on your system - impossible because our two deployments use different generated client tokens anyway.
user-f6250
Michael O'Brien
when i used following command to add portal to ONAP (./createAll.bash -n onap -a robot) i received CashLoopBackOff in portal pods and all the portal pods stays in init state for a long time:
root@omap:~/oom/kubernetes/config# kubectl get pods -n onap-portal
NAME READY STATUS RESTARTS AGE
portalapps-dd4f99c9b-t8hwz 0/2 Init:0/3 1 17m
portaldb-7f8547d599-ppjmx 0/1 CrashLoopBackOff 7 17m
portalwidgets-6f884fd4b4-w2pc7 0/1 Init:0/1 1 17m
vnc-portal-687cdf7845-95bq7 0/1 Init:0/4 1 17m
can you tell me what is this for and how can i solve these two problems?
Michael O'Brien
Hamzeh,
Looks like you are mixing namespaces and pods - you have 2 namespaces (in effect you are bringing up 2 different onap installations in namespaces -n onap and
n onapportaldo
./createAll.bash -n onap
t Io bring everything up (recommended)
or
./createAll.bash -n onap -a robot
./createAll.bash -n onap -a portal
to bring up portal and robot - but if you check the portal yaml you will see it has deependencies - tryi to bring all of onap or the subset in the helm_apps variable in setenv.sh (you need aai, vid...etc - for portal pods to come up)
/michael
user-f6250
Michael O'Brien Thanks for your kinds and your nicely answers.
As you suggested i am bringing up all the namespaces related to the portal. So far all the namespacea are up except one, which comes up with an error. could you tell me what is this error. and how should i solve it
root@omap:~/oom/kubernetes/oneclick# ./createAll.bash -n onap -a sdnc
********** Creating instance 1 of ONAP with port range 30200 and 30399
********** Creating ONAP:
********** Creating deployments for sdnc **********
Creating namespace **********
namespace "onap-sdnc" created
Creating service account **********
clusterrolebinding "onap-sdnc-admin-binding" created
Creating registry secret **********
secret "onap-docker-registry-key" created
Creating deployments and services **********
E0205 14:45:32.429304 32344 portforward.go:331] an error occurred forwarding 36188 -> 44134: error forwarding port 44134 to pod 45eb9cfb4133dd7aa42454821eb8ad61fe179e3ad1375e22fd9f5ade6b2a2c2f, uid : exit status 1: 2018/02/05 14:45:31 socat[1991] E connect(5, AF=2 127.0.0.1:44134, 16): Connection refused
Error: transport is closing
The command helm returned with error code 1
so far sdnc is not showing in my namespaces...
Thanks
Michael O'Brien
Hamzeh,If you have 64G or ram - just run everything at once.
If you want all of onap run the following, you will see an aaf and vfc container error - ignore these.
./createAll.bash -n onap
when you see the intermittent sdnc issues (as long as they are related to image pulls) - restart sdnc
./deleteAll.bash -n onap -a sdnc -y
./createAll.bash -n onap -a sdnc
Follow the flow in the cd.sh example
https://github.com/obrienlabs/onap-root/blob/master/cd.sh
there are some intermittent issues with SDNC
OOM-653 - Getting issue details... STATUS
OOM-614 - Getting issue details... STATUS
OOM-543 - Getting issue details... STATUS
OOM-544 - Getting issue details... STATUS
OOM-537 - Getting issue details... STATUS
Not after the following but after a change TBD that also renamed sdnc-dbhost-0
SDNC-163 - Getting issue details... STATUS
/michael
Santosh Thapa Magar
Hi all,
I installed ONAP on AWS r3 instance yesterday.
I used master branch for the deployment.
There were 91 pods among them 8 got failed.
These are the list of pods that got failed.
Can you suggest me how to fix issues for failed pods.
If anybody with similar issue has found the solution your help will be greatly appreciated.
Best Regards,
Santosh
Michael O'Brien
Santosh,
Hi, that is normal/known behaviour. For aaf and vfc - those 2 don't work and have been busted for at least 6 weeks - ignore them - they are being fixed and you don't need them for any vFW, vDNS activity.
For all the rest - the SDNC containers - this is a known intermittent issue with deployments behind a slow network connection (for example I don;'t get these on AWS) - the readiness probes have timed out before the docker images were pulled (only one needs to fail and the rest in the hierarchy tree are waiting for it)
Fix is to ./deleteAll.bash -n onap -a sdnc
delete sdnc and recreate it
./createAll.bash -n onap -a sndc
Both of these issues are documented in several places and on onap-discuss
For example a latest clean install - on the first run - same issue - 2nd run OK - but periodically we still get an SDNC issue because of the closeness of the pull time to the readiness timeout retries
Would be nice to fix these intermittent deploy issues (usually on clean vm's)
http://jenkins.onap.info/job/oom-cd/1598/console
http://jenkins.onap.info/job/oom-cd/1599/console
/michael
Jun Hu
Hi Santosh, Micheal
I got the similar issue with sdnc-dbhost-0 pending, because the PersistentVolumeClaim is not bound: "sdnc-data-sdnc-dbhost-0" .
In my case, the /dockerdata-nfs is already mounted, so the nfs-provisioner will get an error.
To fix that issue, please change
51 volumes:
52 - name: export-volume
53 hostPath:
54 - path: /dockerdata-nfs/{{ .Values.nsPrefix }}/sdnc/data
+ path:
/nfs-provisioner/
{{ .Values.nsPrefix }}
/sdnc/data
+
type
: DirectoryOrCreate
in oom / kubernetes / sdnc / templates / nfs-provisoner-deployment.yaml
Nicolas
Michael O'Brien
Nice - good JIRA candidate - looking or raising jira
Would be nice to fix these intermittent deploy issues (usually on clean vm's)
http://jenkins.onap.info/job/oom-cd/1598/console
http://jenkins.onap.info/job/oom-cd/1599/console
Andrew Fenner
Well found.
Thanks.
user-acfda
Hi All,
Does VF-C component is needed/required to run vfirewall demo?
Does multi-vim component is needed/required to run vfirewall demo?
Or we can still run the vfirewall demo without the above two mentioned components. Please clarify!
Best Regards,
Shubhra
Michael O'Brien
No multivim has no runtime use right now - especially for the vFW - in the future when the azure seed code comes in it may work with SO during orchestration.
VF-c as well - not required
You only need the original onap 1.0 level seed code components - in the diagram
Tutorial: Verifying and Observing a deployed Service Instance#vFirewallFlow
/michael
Beka Tsotsoria
Hello,
I'm unable to setup portal.
Here's the output for amsterdam branch (commit: c27640a084242f77600a8630b475772094ae314a):
last parts of kubectl describe pod vnc-portal-588f7768df-xrd7j -n onap-portal:
Here's the full log of kubectl logs portalapps-59574d47cc-mnzjn -n onap-portal -c portalapps: https://pastebin.com/5EHN9y1x
Same is for master branch (commit bce13fa0b25fb7932d5ad1be748541682329853c), except vnc-portal does not fail with /ubuntu-init/hosts: No such file or directory, rather it waits for portalapps to startup. So root cause IMO is in portalapps, possibly mysql JDBC driver is missing from classpath? telnet portaldb.onap-portal 3306 works from the portal-apps container
Here's the output of docker images:
I'm running Ubuntu 16.04.3 LTS on VirtualBox. And instead of rancher and full kubernetes, I'm using minikube with--vm-driver=none flag (I believe this should not matter in this case). Any ideas?
kranthi guttikonda
I had the same problem with Amsterdam branch. Master branch has fixes to resolve this. Basically the helm chart they defined lifecycle PostStart which may run before starting container itself (Its not guaranteed). So, please take the portal folder from master branch and replace in Amsterdam or just replace resources folder in side portal (from master) and also portal-vnc-dep.yaml file inside template from master to Amsterdam
helm delete --purge onap-portal
cd oom/kubernetes
helm install --name onap-portal ./portal
Michael O'Brien
Portal works fine in both amsterdam and beijing, You just have to stick to the two version sets
If you see an issue raise a JIRA, describe the problem, ideally provide a workaround or patch and link to existing/related/blocked Jiras so we can review them.
See OOM-486 - Getting issue details... STATUS
Amsterdam: Rancher 1.6.10, Helm 2.3, Docker 1.12, Kubectl 1.8.6
Beijing: Rancher 1.6.14, Helm 2.8, Docker 17.3, Kubectl 1.9.2
Rajeev Kaul
Is Beijing the 'master' branch? Because when I try to get OOM code for Beijing branch it does not find any.
Michael O'Brien
vnc-portal still has issues under the latest version like Helm 2.5+, rancher 1.6.11+ that is OK in master - make sure you are running older versions that match what amsterdam runs with..
Also vnc-portal has dependencies - do you have everything else running like vid for example - i don't see all the dependent pods defined in the yaml for vnc-portal in your image list - just appc and msb.
Remember portal runs against the other components in onap - try to bring the entire system up first and they when you are running OK, start adjusting the system one step at a time to help triage issues. Right now you are dealing with a manager different than the RI, a subset of ONAP deployed and
post your docker, kubernetes, kubectl, helm versions.
Also to aide in getting the system up we for everyone we have standardized on rancher for now - with our second support for kubeadm.
/michael
Beka Tsotsoria
Hello Michael,
Yes I realised later that I need other components as well to run portal, so I decided to do full installation using rancher. However error message in portalapps (java.sql.SQLException: No suitable driver) makes me think there is still something wrong. Anyways I'll try full setup with rancher and let you know how it goes.
I have another issue now, I'm following QuickstartInstallation from master branch (ce7844b207021251ec76a5aa5d7b8c1de3555a12) and prepull fails with following error:
I'm not sure if I'm supposed to update values.yml file. For now I'll continue using amsterdam branch without prepulling
kranthi guttikonda
Don't you guys think this page is growing long? Perhaps we should post questions or comments in https://wiki.onap.org/questions
So that everyone will get benefit? I think we can move these comments as questions. If you want I can take care of that.
Michael O'Brien
Kranthi, hi, I know you like organizing things. I think keeping this page as is for now is ok - eventually we will archive it.
for now if you want to move to questions, we will watch that as well
thank you
/michael
Raymond Wong (IBM)
I found that the aaiServiceClusterIp is hardcoded to 10.43.255.254 in the following two yaml file.
I wonder why is it hardcoded.
Can I change it to something else? My network only allow 10.0.0.0/24, and that caused aai deployment to fail.
If I can change it, are those two files the only place to update?
./policy/values.yaml:aaiServiceClusterIp: 10.43.255.254
./aai/values.yaml:aaiServiceClusterIp: 10.43.255.254
Thanks.
Syed Atif Husain
I created dcae as below and it came as status PENDING but i cant find it list of all pods when i give the cmd "kubectl get pods --all-namespaces"
/home/onap/oom/kubernetes/oneclick# ./createAll.bash -n onap -a dcaegen2
********** Creating instance 1 of ONAP with port range 30200 and 30399
********** Creating ONAP:
********** Creating deployments for dcaegen2 **********
Creating namespace **********
namespace "onap-dcaegen2" created
Creating service account **********
clusterrolebinding "onap-dcaegen2-admin-binding" created
Creating registry secret **********
secret "onap-docker-registry-key" created
Creating deployments and services **********
secret "dcaegen2-openstack-ssh-private-key" created
configmap "dcaegen2-config-inputs" created
NAME: onap-dcaegen2
LAST DEPLOYED: Wed Feb 7 08:28:38 2018
NAMESPACE: onap
STATUS: DEPLOYED
RESOURCES:
==> v1/Pod
NAME READY STATUS RESTARTS AGE
dcaegen2 0/1 Pending 0 0s
**** Done ****
song wenjian
I use kubernetes to build ONAP. except AAF, other services are pulled up. Now use vnc-portal to access the policy and aai, but the page is stuck,..
As shown below
I want to know how to locate the problem in this situation?thx.
Michael O'Brien
hit f12 (developer mode) (more tools | developer tools | network tab) to see the underlying http calls and rest calls happening - it should show you what is non-200.
You should verify AAI is up by hitting the nodeport. - direct rest calls can be made there without going through vnc-portal
Hong Guan
Hi Michael O'Brien,
We have a 4 nodes k8s cluster with 16 G RAM for each node (OpenStack), we are experiencing the 'OutOfDisk' with 3 of the 4 nodes, see below, seems the scheduler does not balance the component memory requirements, and I do not see memory limit configuration in OOM deployment definition file. We will do more investigation about the component memory usage at runtime. Do you have any suggestion and plan for this?
Thanks,
Hong
Michael O'Brien
Hi, you need a minimum 80 (on a single host) to bring up ONAP - you will need a shared volume across the nodes for /dockerdata-nfs
/michael
ATUL ANGRISH
HI Michael,
I am trying to spin a vFW VF on openstack using ONAP component.
But I am getting below mentioned error.
Is this an issue in latest release of OOM or am i missing something.
I am not selecting SDN-C Pre-load option while creating VF
user-acfda
In the cd.sh script, the curl query to add an AAI region is listed ( POST and GET curl query)
You need to run that query, I think so.
Vijayalakshmi H
Hi Michael O'Brien,
I am deploying OOM Amsterdam without DCAE using Rancher.
The versions for
git clone -b amsterdam https://gerrit.onap.org/r/p/oom.git
2.1 ubuntu@onapk8svm:~/oom/kubernetes/oneclick$ sudo helm version
Client: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}
ubuntu@onapk8svm:~/oom/kubernetes/oneclick$
2.2 ubuntu@onapk8svm:~/oom/kubernetes/oneclick$ sudo kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T10:09:24Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.7-rancher1", GitCommit:"a1ea37c6f6d21f315a07631b17b9537881e1986a", GitTreeState:"clean", BuildDate:"2017-10-02T21:33:08Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
2.3 ubuntu@onapk8svm:~/oom/kubernetes/oneclick$ sudo docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.4
Git commit: 78d1802
Built: Tue Jan 10 20:38:45 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.4
Git commit: 78d1802
Built: Tue Jan 10 20:38:45 2017
OS/Arch: linux/amd64
ubuntu@onapk8svm:~/oom/kubernetes/oneclick$ o
sudo docker images | wc -l
120 .
4.-Post PrePull – we are facing following error .
Error: image onap/sparky-be:v1.1.1 not found
Error: image onap/data-router:v1.1.1 not found
Error: image onap/usecase-ui:v1.0.1 not found
Error: image mariadb:10.1.11 not found
Error: image onap/clamp:v1.1.0 not found
Error: image openecomp/dcae-collector-common-event:1.1-STAGING-latest not found
Error: image onap/policy/policy-nexus:v1.1.3 not found
Error: image openecomp/sdc-cassandra:v1.1.0 not found
Error: image onap/ccsdk-dgbuilder-image:v0.1.0 not found
Error: image onap/multicloud/framework:v1.0.0 not found
Error: image openecomp/sdc-kibana:v1.1.0 not found
Error: image openecomp/sdc-frontend:v1.1.0 not found
Error: image onap/multicloud/openstack-ocata:v1.0.0 not found
Error: image onap/aai/esr-server:v1.0.0 not found
Error: image openecomp/sdc-backend:v1.1.0 not found
Error: image openecomp/dcae-dmaapbc:1.1-STAGING-latest not found
Error: image onap/multicloud/vio:v1.0.0 not found
5.Our Pod Status is
ubuntu@onapk8svm:~/oom/kubernetes$ git status
ubuntu@onapk8svm:~/oom/kubernetes$ kubectl get pods --all-namespaces | grep Crash
onap-aai aai-resources-4188957633-809m7 1/2 CrashLoopBackOff 26 7h
onap-aai aai-traversal-140815912-xqfvp 0/2 CrashLoopBackOff 25 7h
onap-aai model-loader-service-911950978-v03hh 1/2 CrashLoopBackOff 31 7h
onap-aai search-data-service-2471976899-0v0ss 1/2 CrashLoopBackOff 31 7h
onap-aai sparky-be-1779663793-j2vsw 1/2 CrashLoopBackOff 31 7h
onap-mso mso-681186204-2ggzj 1/2 CrashLoopBackOff 26 7h
onap-policy drools-534015681-1vpg5 1/2 CrashLoopBackOff 27 7h
onap-policy pap-4181215123-n34d9 1/2 CrashLoopBackOff 26 7h
onap-sdc sdc-be-2336519847-q141c 1/2 CrashLoopBackOff 24 7h
onap-sdc sdc-fe-2862673798-v2h5w 1/2 CrashLoopBackOff 22 7h
onap-sdnc sdnc-1507781456-1t5q4 1/2 CrashLoopBackOff 26 7h
onap-vid vid-server-421936131-gc9zf 1/2 CrashLoopBackOff 31 7h
ubuntu@onapk8svm:~/oom/kubernetes$ kubectl get pods --all-namespaces | grep Init
onap-aai aai-service-749944520-k1q0r 0/1 Init:0/1 13 7h
onap-appc appc-dgbuilder-2298093128-4r38c 0/1 Init:0/1 13 7h
onap-policy pdp-2622241204-mb18f 0/2 PodInitializing 0 7h
onap-portal vnc-portal-1252894321-426xm 0/1 Init:1/5 1 53m
ubuntu@onapk8svm:~/oom/kubernetes$ kubectl get pods --all-namespaces | grep Error
onap-portal portalapps-1783099045-dg5rq 1/2 Error 0 53m
ubuntu@onapk8svm:~/oom/kubernetes$ kubectl get pods --all-namespaces | grep onap-portal
onap-portal portalapps-1783099045-dg5rq 1/2 Error 0 53m
onap-portal portaldb-1451233177-5v2ln 1/1 Running 0 53m
onap-portal portalwidgets-2060058548-j7n44 1/1 Running 0 53m
onap-portal vnc-portal-1252894321-426xm 0/1 Init:1/5 1 53m
ubuntu@onapk8svm:~/oom/kubernetes$ kubectl get pods --all-namespaces | grep Error
onap-portal portalapps-1783099045-dg5rq 1/2 Error 0 54m
Can you please help us know – how to fix these .
Thanks
Vijaya
Michael O'Brien
I noticed you are running helm 2.8 in amsterdam - only beijing can run the latest helm.. Your environment is mixed, I would expect kubernetes 1.8 on the server. Your docker version is ok at 1.12 instead of 17.3 (beijing)
Amsterdam still only supports the older version set
Rancher 1.6.10
Kubernetes 1.8.6
Docker 1.12
Helm 2.3 (both client and server)
If you had that many prepull issues with docker images I would expect your network.
Do a full delete and create of the pods to bounce them now that the images are pulled.
If you still have issues check your proxy.
/michael
Vijayalakshmi H
Hi Michael O'Brien,
Thanks for your suggestion. I have setup a new VM as per your suggestions.
I cloned the git repository using
git clone -b amsterdam https://gerrit.onap.org/r/p/oom.git
and got the prepull.sh from here.
https://jira.onap.org/secure/attachment/10750/prepull_docker.sh
and initiated the prepull.sh script. After 6-7hrs I see that no nexus images were downloaded and then I have rerun the "docker pull" manually on one of the images. It is badly slow.
The prepull.sh with higher versions of helm and kubectl was way better.
The only change is the versions of helm and kubectl as against the old VM. And of course the prepull_docker.sh.
Any thoughts about the cause.
Also, it be a great help if I get the count of docker images for a successful deployement. "docker images|wc -l"
Thanks
Vijaya
Michael O'Brien
the prepull script is not required and it has nothing to do with helm and kubectl (it is just a script that parses image names and tags and then does a docker pull on each) - it is there just so that all the images are available when you bring up the pods - otherwise all the dependencies will need to wait until images load - which usually exceeds the wait times for the pods - which means you need to bring up onap twice.
all the 95 images that the prepull script gathers from the values.yamls take 15 to 20 min on an AWS instance for me.
this means one of two things if you are taking hours to prepull images
1) your internal network has a proxy and is slowing things down - you don't mention if you are inside a firewall
2) you are in a region (asia) that is known to have issues pulling from the nexus3 servers (the LF hosts on a some region that I don't know on AWS) - there are many reports of a mirror being required for China and India - perhaps this is your issue.
bottom line is do a docker images - check that all the images are there - optionally turn off pulling automatically in the yamls
As a test - can you verify that you are following the correct procedure by reproducing the RI at the top of the page
get a spot VM on AWS in the us-west region (ohio is currently 0.07/hour for a 64g R4.2xLarge) - install oom there and you should be up in an hour (5 min for rancher/k8s/helm/docker, 20m for docker pulls, 20m to bring up onap)
/michael
Vijayalakshmi H
Thanks Michael.
We are trying to deploy OOM on Amsterdam Branch with Rancher without dcae , hence we have modified our setEnv.sh
There are certain confusion we have .
(a) – We would like to know what is the count of docker images present in docker images | wc -l except DCAE for OOM Amsterdam Release.
(b) – What is the exact number of Pod that should be UP except DCAE for OOM Amsterdam Release.
-Vijaya
Vijayalakshmi H
Hi Michael O'Brien,
I have deployed DCAEGEN2. The logs in the heat-bootstrap pod have some errors:
+ openstack stack create -t /opt/heat/onap_dcae.yaml -e /opt/heat/onap_dcae.env dcae ERROR: Property error: : resources.oam_onap_subnet.properties.cidr: : Error validating value 'DCAE_OS_OAM_NETWORK_CIDR_HERE': Invalid net cidr invalid IPNetwork DCAE_OS_OAM_NETWORK_CIDR_HERE + sleep 10
.simpledemo.onap.org. -f=yaml -c id ++ awk '{ print $2} ' Could not find requested endpoint in Service Catalog. + SIMPLEDEMO_ONAP_ORG_ZONE_ID=
......................
I have the following configuration for DCAE in onap-parameters.yaml.
######## # DCAE # ########
# Whether or not to deploy DCAE # If set to false, all the parameters below can be left empty or removed # If set to false, update ../dcaegen2/values.yaml disableDcae value to true, # this is to avoid deploying the DCAE deployments and services. DEPLOY_DCAE: "true"
# DCAE Config DCAE_DOCKER_VERSION: v1.1.1 DCAE_VM_BASE_NAME: "dcae"
# ------------------------------------------------# # OpenStack Config on which DCAE will be deployed # # ------------------------------------------------#
# Whether to have DCAE deployed on the same OpenStack instance on which VNF will be deployed. # (e.g. re-use the same config as defined above) # If set to true, discard the next config block, else provide the values.
IS_SAME_OPENSTACK_AS_VNF: "true"
# Fill in the values in below block only if IS_SAME_OPENSTACK_AS_VNF set to "false" # --- # Either v2.0 or v3
DCAE_OS_API_VERSION: ""
DCAE_OS_KEYSTONE_URL: ""
DCAE_OS_USERNAME: ""
DCAE_OS_PASSWORD: ""
DCAE_OS_TENANT_NAME: ""
DCAE_OS_TENANT_ID: ""
DCAE_OS_REGION: "" # ---
# We need to provide the config of the public network here, because the DCAE VMs will be # assigned a floating IP on this network so one can access them, to debug for instance. # The ID of the public network.
DCAE_OS_PUBLIC_NET_ID: "1cb65443-e72f-4eab-8bbb-f979b8259c92"
# The name of the public network. D
CAE_OS_PUBLIC_NET_NAME: "external_network"
# This is the private network that will be used by DCAE VMs. The network will be created during the DCAE boostrap process, # and will the subnet created will use this CIDR.
DCAE_OS_OAM_NETWORK_CIDR: "10.99.0.0/27"
# This will be the private ip of the DCAE boostrap VM. This VM is responsible for spinning up the whole DCAE stack (14 VMs total) DCAE_IP_ADDR: "10.99.0.2"
# The flavors' name to be used by DCAE VMs DCAE_OS_FLAVOR_SMALL: "m1.small" DCAE_OS_FLAVOR_MEDIUM: "m1.medium" DCAE_OS_FLAVOR_LARGE: "m1.large" # The images' name to be used by DCAE VMs DCAE_OS_UBUNTU_14_IMAGE: "ubuntu-14.04-server-cloudimg" DCAE_OS_UBUNTU_16_IMAGE: "ubuntu-16.04-server-cloudimg" DCAE_OS_CENTOS_7_IMAGE: "centos7-cloudimg"
# This is the keypair that will be created in OpenStack, and that one can use to access DCAE VMs using ssh. # The private key needs to be in a specific format so at the end of the process, it's formatted properly # when ending up in the DCAE HEAT stack. The best way is to do the following: # - copy paste your key # - surround it with quote # - add \n at the end of each line # - escape the result using https://www.freeformatter.com/java-dotnet-escape.html#ad-output
DCAE_OS_KEY_NAME: "keyname"
DCAE_OS_PUB_KEY: "ssh-rsa ....... Generated-by-Nova"
DCAE_OS_PRIVATE_KEY: "\"-----BEGIN RSA PRIVATE KEY-............---END RSA PRIVATE KEY-----\""
DNS_IP : "8.8.8.8"
DNS_FORWARDER: "8.8.8.8"
# Public DNS - not used but required by the DCAE boostrap container
EXTERNAL_DNS: "8.8.8.8"
# DNS domain for the DCAE VMs
DCAE_DOMAIN: "dcaeg2.onap.org"
# Proxy DNS Designate. This means DCAE will run in an instance not support Designate, and Designate will be provided by another instance. # Set to true if you wish to use it DNSAAS_PROXY_ENABLE: "false"
# Provide this only if DNSAAS_PROXY_ENABLE set to true. The IP has to be the IP of one of the K8S hosts. # e.g. http://10.195.197.164/api/multicloud-titanium_cloud/v0/pod25_RegionOne/identity/v2.0
DCAE_PROXIED_KEYSTONE_URL: ""
# -----------------------------------------------------# # OpenStack Config on which DNS Designate is supported # # -----------------------------------------------------#
# If this is the same OpenStack used for the VNF or DCAE, please re-enter the values here.
DNSAAS_API_VERSION: "v3"
DNSAAS_REGION: "RegionOne"
DNSAAS_KEYSTONE_URL: "http://URL:5000"
DNSAAS_TENANT_ID: "b522e7abc1784e938314b978db96433e"
DNSAAS_TENANT_NAME: "user
DNSAAS_USERNAME: "user
DNSAAS_PASSWORD: "userpwd
$
Request you to help us in getting the configurations right.
Thanks
Vijaya
Syed Atif Husain
Michael O'Brien I have Rancher 1.6.10, Kubernetes 1.8.6, Docker 1.12 and helm 2.3
./cd.sh -b amsterdam is giving below error
**** Creating configuration for ONAP instance: onap
namespace "onap" created
Error: YAML parse error on config/templates/pod.yaml: error converting YAML to JSON: yaml: line 57: did not find expected key
**** Done ****
verify onap-config is 0/1 not 1/1 - as in completed - an error pod - means you are missing onap-parameters.yaml or values are not set in it.
The file pod.yaml format is invalid as per yamllint.com, but same file is there on my other machine too where onap is running fine.
Below line in "createConfig.sh" throws the error, as per my analysis
helm install . --name "$1-config" --namespace $1 --set nsPrefix=$1
Pls advise.
I have another question, can we have RHEL openstack or is it compulsory to have ubuntu openstack?
Michael O'Brien
I have only tested on Ubuntu 16.04 - you are free to try on Redhat 7.3 - let us know if there are any issues by adding a section to this page when you get it working.
in your error you missed pasting what "expected key" was - as in which of the keys in setenv.sh are missing.
There should be no issues with the config pod - I ran it twice last week on amsterdam.
If your config pod fails it means any of the following
Syed Atif Husain
Michael O'Brien Resolved.
I was using the latest onap-parameters.yaml at https://github.com/obrienlabs/onap-root/blob/master/cd.sh
I replaced it with https://git.onap.org/oom/tree/kubernetes/config/onap-parameters-sample.yaml?h=amsterdam (with my updated values)
I am still facing issues with onap to openstack connectivity. If you can please point me to some notes which you used to setup openstack, e.g. created public/private networks, enabled ssh to openstack, enabled connectivity between VMs in openstack. I will really appreciate that.
Michael O'Brien
I'll post my openstack heat template - based on the onap template - to help bring up a VM specific to openstack.
It is a single VM - connectivity is via the public network on you openstack. Remember that the dcae-bootstrap heatbridge will bring up all the DCAE VMs - so this is dynamic. All you need to provide is the tenant-id, tenant name, the keystone urls and the ability to create 15 EIPs - most of is is through the cloudify manager.
And the updated onap-parameters.yaml
After I scrub them
https://lists.onap.org/pipermail/onap-discuss/2018-February/008059.html
Michael O'Brien
Page has been retrofitted by a team within the OOM project yesterday.
The new structure is a simpler landing page with versions
Details on installing in openstack, azure, aws, google using rancher, kubeadm or cloudify are in sub pages
Instructions will be automated into several templates via OOM-710 - Getting issue details... STATUS
vishwanath jayaraman
It would be great if there was a separate section for the Amsterdam release.
Vijayalakshmi H
Hi Michael O'Brien,
I have all the 89 pods up and running(without dcae).
However the robot health check always fails for ASDC component.
------------------------------------------------------------------------------
Basic ASDC Health Check | FAIL |
DOWN != UP
------------------------------------------------------------------------------
I have tried deleting and recreating the pod and also the entire OOM redeployment. The problem persists.
Fails if objects are unequal after converting them to strings.
When I proceed with the service deployment with the VNC-PORTAL, an "internal error" is thrown while distributing the service.
Any suggestions would be of great help.
Thanks
Vijaya
Michael O'Brien
First thing I would check is the logs on the container, then the CD job as a comparison, then recent commits on the OOM infrastructure side, then commits to SDC itself. After this trace through the startup and/or debug the health check to start. I can do these for you.
Vijayalakshmi H
Thanks Michael. I am attaching the logs for all SDC containers. FYI, even when I use cd.sh for deployment, the same problem is hit.
Attached the logs from sdc container. Request to kindly check and let me know the issue.
Thanks
Vijaya
Vijayalakshmi H
Hi Micheal,
1. We will wait for the fix for the SDC healthcheck issue or a workaround.
2. Request you to share the onap-parameters.yaml you have used to deploy the dacegen2. I have all-in-one openstack setup and no DNS designate/forwarder. What should be the value of the following parameters in my opan-parameter.yaml:
DNS_IP :
DNS_FORWARDER:
DNSAAS_PROXY_ENABLE: "false" DCAE_PROXIED_KEYSTONE_URL: ""
Thanks
Vijaya
Michal Ptacek
Hi Vijaya & Michael,
I am fighting with that U-EB issue some time already, various different deployments (both amsterdam & master). I assume that it's not just about this SDC healthcheck, but it simply means that SDC can't talk to DMAAP, is this correct ? Is this show-stopper for ONAP functionality and to get e.g. vFWCL demo running ?
if anyone will fix this red-herring, please share,
so do I if will be successful.
thanks,
Michal
Michal Ptacek
I found something, in my case it was configuration problem, U-EB server IP is IMHO wrongly calculated in init/config-init.sh via:
kubectl get nodes -o jsonpath='{ $.items[*].status.addresses[?(@.type=="ExternalIP")].address }'
which is external IP of host and will never match my DMAAP service, I patched /dockerdata-nfs/onap/sdc/environments/AUTO.json redeployed sdc and this one passed !
Abdelmuhaimen Seaudi
Hi Michal, I agree that the issue is that SDC is configured with external IP, instead of internal IP of DMAAP.
After you changed the AUTO.json file, what did you do ?
did you need to ./deleteAll -n onap -a sdc?
or did you restart sdc be using another method ?
Thanks.
pranjal sharma
Hi Michael,
I am running master release of beijing and i am facing problem to get run the particular container of sdc. sdc-be is showing the imagePullBackoff error.
Can you please suggest the way forward.
Output of kubectl command of sdc-be pod as follows:
kubectl describe pod sdc-be-74488cb585-wpcdn -n onap-sdc
filebeat-onap:
Container ID:
Image: docker.elastic.co/beats/filebeat:5.5.0
Image ID:
Port: <none>
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/usr/share/filebeat/data from sdc-data-filebeat (rw)
/usr/share/filebeat/filebeat.yml from filebeat-conf (rw)
/var/log/onap from sdc-logs-2 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-t94b5 (ro)
Warning Failed 32s kubelet, onap-kuber-oom Failed to pull image "docker.elastic.co/beats/filebeat:5.5.0": rpc error: code = Unknown desc = Error response from daemon: Get https://docker.elastic.co/v2/beats/filebeat/manifests/5.5.0: Get https://docker-auth.elastic.co/auth?scope=repository%3Abeats%2Ffilebeat%3Apull&service=token-service: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning FailedSync 28s (x2 over 32s) kubelet, onap-kuber-oom Error syncing pod
Normal BackOff 28s kubelet, onap-kuber-oom Back-off pulling image "docker.elastic.co/beats/filebeat:5.5.0"
Michal Ptacek
Hi Pranjal,
it looks like your local env issue, I would try manual attempt
e.g.
docker pull docker.elastic.co/beats/filebeat:5.5.0
and troubleshoot errors, tcpdump ? can you reach docker.elastic.co ?
works for me:
# docker pull docker.elastic.co/beats/filebeat:5.5.0
5.5.0: Pulling from beats/filebeat
Digest: sha256:fe7602b641ed8ee288f067f7b31ebde14644c4722d9f7960f176d621097a5942
Status: Image is up to date for docker.elastic.co/beats/filebeat:5.5.0
Michal
Alain Drolet
I'm trying to get up to date on how to deploy the DCAE components.
From that page it looks like (at least at the beginning) DCAE could not be deployed from OOM, and needed to be deployed using a heat file in OpenStack.
I know that there are effort to make it deployable by k8s.
What is the state of OOM k8s deployment of DCAE today?
For a design environment where we try to limit the footprint of the deployment, are there good ways to get this up and running in a simple form.
Any tips will be appreciated.
Thx
I Chen
According to Michael's answer here, DCAEGEN2 (not original dcae) is working in amsterdam branch.
You might also look at the discussion here: https://lists.onap.org/pipermail/onap-discuss/2018-February/008059.html .
Alain Drolet
Thank you I Chen
I was a bit confused with all the references to DCAE.
If I get this right OOM (Amsterdam branch) will kick-off DCAE,
but DCAE components will be deployed as VMs in an OpenStack could, not as k8s pods in the ONAP VM.
---
From another page I found that requirements for a full ONAP is:
The ONAP installation requires the following footprint:
What is the smallest size the environment can be (I guess here I means only the DCAE OS cloud) to get DCAE "working" for a design deployment?
I Chen
Based on the page ONAP Deployment Specification for Finance and Operations, I think you can cobble together a setup where you have everything, except DCAE, in kubernetes in one VM, and DCAE in separate VM(s). Unfortunately, I haven't found more concrete instructions than that page.
This page Minimal Assets for Physical Lab lists currently lists 3 different environment sizes for different use cases. Perhaps it could be useful to you.
Sorry, I'm a newbie, and although I'm happy to share what I know and have found, I don't know much more at this time.
Alain Drolet
Thx
I'm learning the rope as well.
BTW: the number pasted above are from:
https://onap.readthedocs.io/en/latest/guides/onap-developer/settingup/fullonap.html
Michael O'Brien
Guys, normally I answer all of these - have about 300% other projects besides OOM right now
You are good with 55G for all of onap (minus DCAE) - if you don't plan on doing closed loop
Otherwise you need 156G (~90G for DCAE) - to run closed loop
ONAP on Kubernetes on OpenStack#Overallrequiredresources:
Alain Drolet
Hi Michael
Thank you for taking the time to provide this clear answer and reference given your busy schedule.
The DCAE part has always been fuzzy for me.
Beka Tsotsoria
Hi Michael, Any thoughts on this please?
Thanks!
Michael O'Brien
Beka,
Hi, as Roger mentions - the teams themselves are responsible for what is running in their containers.
However, a couple observations.
You are running a higher kubectl client 1.9.2 (try to use 1.8.6) to match your server - not a big issue
You are running helm 2.8.0 - this may have an issue with the latest change in OOM-722 - try to use the RI versions 2.6.1
OOM-722 - Getting issue details... STATUS
I have observed the memory footprint go to 69G after a week on a 122G VM on AWS that was idle - so yes we have essentially crossed the 64G barrier - I will update the RI requirements. On a 64G machine we now saturate to 63G within 48h.
To help you out - you can run with a reduced number of ONAP components - I have been doing this at customer sites recently.
Unless you are running advanced use cases like vVolte or vCPE you can delete the following
OOM-511 - Getting issue details... STATUS
vnfsdk, aaf, vfc
https://lists.onap.org/pipermail/onap-discuss/2018-March/008555.html
Michael O'Brien
fyi, fully automated rancher install for amsterdam or master in the script under review - see instructions at the top of this page.
https://gerrit.onap.org/r/#/c/32019
OOM-715 - Getting issue details... STATUS
I needed this to finish the aws/azure automation for the conference next month
Winnie Tsang (IBM)
Hi Michael,
Is nexus3.onap.org:10001/onap/refrepo:1.0-STAGING-latest image for refrepo in vnfsdk compoonent just get updated? I can't pull this image anymore, but I can pull this image successfully last week.
This is the error message I got when I try to pull it today
"error pulling image configuration: unknown blob"
Thanks
Michael O'Brien
Sorry to hear that - could you let the VNFSDK project know (JIRA) and/or post to the onap-discuss group
Manikandan Ramachandran
Hi Michael O'Brien,
As of now what is the known stable method of Installation for ONAP?
Michael O'Brien
both the heat and oom installations are stable - if you want DCAE via OOM use only amsterdam. If you want the latest code use Beijing.
As you are aware we have minor manual integration testing and no real continuous deployment so whether a particular component works on any day is up to random chance
/michael
Manikandan Ramachandran
Thankyou Michael
Manikandan Ramachandran
Hi Michael O'Brien,
With amsterdam branch unable to access the services at their default port (Eg: portalapps 8989).
Based on the Kubernetes service configuration it looks like services are exposed using NodePort (within range 30000-32767). Even if we access the service on configured NodePort some of the services like portalapps redirects back to source Port.
Michael O'Brien
Use vnc-portal on 30211 -everything works there because that VM in a container is inside the namespace to resolve ports like 8989- there are workarounds to put in a port redirector in Firefox that Vitaliy showed me - Ideally we get this workaround into a JIRA. Sorry but the ports are currently hardcoded in ONAP , OOM just serves them up.
Amsterdam works relatively fine except for the odd SDC 500/503 and SDNC issue during VF-module creation.
/michael
Manikandan Ramachandran
What about changing the Service type from NodePort to ClusterIP? It is working for us but don't know whether there will be any regressions..
Syed Atif Husain
I have deployed onap oom on 64 GB RAM 300 GB HDD VM. All pods are running
After 3-4 days, CPU utilization becomes close to 200% and I need to restart VM or reinstall ONAP. I have faced this multiple times. Why does this happen?
Kiran Kamineni
./createall.sh seems to have a bug where creating a single component on a clean Rancher/k8s installation will fail with image pulling errors.
This is because, create_registry_key and create_service_account are not called when single components are installed.
I might be doing something wrong here though. Any suggestions?
Michael O'Brien
Testing on an Azure VM right now
Michael O'Brien
nice one kiran
OOM-805 - Getting issue details... STATUS
workaround is to use HELM_APPS
#HELM_APPS=('consul' 'msb' 'mso' 'message-router' 'sdnc' 'vid' 'robot' 'portal' 'policy' 'appc' 'aai' 'sdc' 'dcaegen2' 'log' 'cli' 'multicloud' 'clamp' 'vnfsdk' 'uui' 'aaf' 'vfc' 'esr')
HELM_APPS=('robot' 'aai')
Kiran Kamineni
Thanks! This works.
Pantelis Monogioudis
We can also confirm the problem reported originally by Kiran Kamineni. In our case the solution proposed also worked but we have slightly different outcome this morning.
Any ideas as to why one of the pods crashes and two pods are stuck in the init state? Our environment is behind a proxy.
Michael O'Brien
aai-service (both pods) depend on aai-traversal and aai-resources (so the root service is blocked) - see the resources yaml files for reference
aai-traversal has timed out - just delete the aai-resources container and kubernetes will restart it. then when it is 1/1 delete aai-service so it gets recreated.
note: there are limited number of resets - usually if you are not working after 30 min - you will never work until you bounce some pods
/michael
Pantelis Monogioudis
Thank you but something must have changed - even if I do execute the script to delete all, I still get the same error.
Michael O'Brien
Just noticed your date - thought you were before the 16th - you were the first to discover the kubernetes 1.8.9 regression that was ported by Rancher back to 1.6.14
you and the one of the CD systems that had no watcher for a couple days
(last good build) http://jenkins.onap.info/job/oom-cd/2410/console
Workaround is to run with 1.6.12 (and docker 1.12) for now until
https://github.com/kubernetes/kubernetes/issues/61076
is fixed
see the onap-discuss trail
https://lists.onap.org/pipermail/onap-discuss/2018-March/008751.html
------------------wrote this before I noticed your date
master has an issue since friday if you reinstalled rancher since then
OOM-813 - Getting issue details... STATUS
can you bounce your individual containers
the last to bounce would be aai-service
like
wait until it is up to bounce the others in sequence.
current master state - see OOM-813
Dominic Lunanuova
When retrieving cd.sh attachment, I could not find:
https:
//jira.onap.org/secure/attachment/11262/cd.sh
but I did find this in Jira: https://jira.onap.org/secure/attachment/11285/cd.sh
Michael O'Brien
Hard to keep all the changing references up to date - I do however keep the root oom_entrypoint.sh script current - as this is downloaded in my CD system to bring everything else in
Use the attachment here - I will remove all oom_rancher_setup.sh and cd.sh references from the wiki - until we get all this committed in the OOM repo
OOM-710 - Getting issue details... STATUS
Dominic Lunanuova
thanks.
Another possible correction (or my pilot error). At the point in the instructions where it says:
there was no oom_rancher_install.sh in the directory. But there was a oom_rancher_setup.sh (which seemed to match the next part of the instructions), so I used it instead,and it seemed to do a lot of nice things. BTW, got an error on first running as user ubuntu, so tried again using sudo which worked.
Michael O'Brien
yes thanks - an mix of old and new edits - I used to name that script oom_rancher_install before I ported it to the OOM repo as oom_rancher_setup - like the captured output says.
Anyway I fine tuned the rancher script - it has been tested on openstack, azure, aws. You can use it to fully provision an Ubuntu 16 box - like you did - and yes if running non-root - do a sudo.
Usually you have to log out/in to pickup the ubuntu user as docker enabled - but lately not.
If you actually run the oom_entrypoint.sh script - you can walk away - assuming the branch is stable - and return after 80 min with a running system - note however that you should comment out cd.sh and replace your own onap-parameters.yaml before running it.
the rancher script is agnostic but very sensitive to the IP or DNS name for the server.
Dominic Lunanuova
Michael, one concept that is not jumping out at me: what is the command for running a helm chart for a new app after k8s is deployed using these instructions? i.e. when first developing your helm chart
Michael O'Brien
Check the ./createAll.bash script - it currently wraps helm until all the refactoring is in to make this script obsolete
like "helm install..." in the create_onap_helm() function
For ease of use - edit HELM_APPS for what pods you want in setenv.sh and run ./createAll.bash -n onap
You can also use the Kubernetes UI to deploy/generate a new chart
or follow the logging demo RI - where a new app was added for logging
Logging Reference Implementation#DeployingtheRI
Dominic Lunanuova
thx. very useful! does this sound right?
pranjal sharma
Hello All,
I was able to create/deploy the vFirewall package (packet generator, sinc and firewall vnf)on openstack cloud.
But i couldnt able to login into any of vnf's vm.
After when i debug i see i didnt change the default public key with our local public key pair in the PACKET GENERATOR curl jason UI.
Now i am deploying the VNF again (same Vfirewall Package) on the openstack cloud, thought of giving our local public key in both pg and sinc json api's.
I have queries for clarifications :
- how can we create a VNF package manually/dynamically using SDC component (so that we have leverage of get into the VNF vm and access the capability of the same)
- And I want to implement the Service Function chaining for the deployed Vfirewall, please do let me know how to proceed with that.
PS: I have installed/Deployed ONAP using rancher on kubernetes (on openstack cloud platform) without DACE component so i haven't had leverage of using the Closed Loop Automation.
Kindly reply back soon.
Thanks,
Pranjal
I Chen
Is there a new problem with the robot pod? I'm using a freshly cloned master branch oom.git and see the following error when attempting to create the robot pod.
Michael O'Brien
Yes, track JIRA and onap-discuss
https://jira.onap.org/browse/OOM-815
https://lists.onap.org/pipermail/onap-discuss/2018-March/008731.html
If you watch those you will see the current state of the release
retesting after the merge
switched build back to master because of https://github.com/rancher/rancher/issues/12178
http://jenkins.onap.info/job/oom-cd/2457/console
/michael
Michael O'Brien
still busted - will trying running robot via helm only and keeping it out of HELM_APPS
I Chen
For what it's worth, I haven't changed docker (17.03.2-ce), rancher (server 1.6.14, agent 1.2.9), kubectl (server gitversion 1.8.5-rancher1, client gitversion 1.8.6), and helm (2.6.1) since Mar. 16.
The changes I have since Mar. 16 are ONAP images and oom.git.
I Chen
Just in case this information is helpful, I see the same problem with appc. I also added a comment in OOM-815 - Getting issue details... STATUS .
Michael O'Brien
Rancher has closed the issue as they have added 1.8.10 to 1.6.14 - retesting
Unfortunately Rancher 1.6.14 which was released months ago has gone through 3 versions of Kubernetes in 7 days (1.8.5. 1.8.9 and 1.8.10) - need to see if we are compatible and also that helm 2.6.1 is ok with 1.8.9
https://github.com/rancher/rancher/issues/12178
via
http://jenkins.onap.info/job/oom-cd/2457/console
Note kubectl is now 1.8.10
We are in serious need of a DevOps testing team at ONAP
/michael
https://lists.onap.org/pipermail/onap-discuss/2018-March/008766.html
I Chen
I'm not sure what changed, whether it's that I'm manually starting each component and waiting for a component to become ready before starting a second component.
But there seems to be a dependency between portal and sdc. Portal seems to now be waiting for SDC, and portal-vnc becomes stuck in Init if SDC hasn't been started. (Reversing the order by starting SDC first followed by Portal seems to allow both Portal and SDC to come up.) Maybe it's the same problem as OOM-514 - Getting issue details... STATUS , maybe not because deleting and restarting does not help.
Radhika Kaslikar
Hi,
I am not able to clone the below scripts:
wget https:
//jira.onap.org/secure/attachment/ID/cd.sh
wget https:
//jira.onap.org/secure/attachment/ID/aaiapisimpledemoopenecomporg.cer
wget https:
//jira.onap.org/secure/attachment/ID/onap-parameters.yaml
wget https:
//jira.onap.org/secure/attachment/ID/aai-cloud-region-put.json
root@OnapServer:/home# wget https://jira.onap.org/secure/attachment/ID/cd.sh
--2018-03-29 13:54:23-- https://jira.onap.org/secure/attachment/ID/cd.sh
Resolving jira.onap.org (jira.onap.org)... 198.145.29.92
Connecting to jira.onap.org (jira.onap.org)|198.145.29.92|:443... connected.
HTTP request sent, awaiting response... 404
2018-03-29 13:54:24 ERROR 404: (no description).
root@OnapServer:/home# wget https://jira.onap.org/secure/attachment/ID/aaiapisimpledemoopenecomporg.cer
--2018-03-29 13:55:43-- https://jira.onap.org/secure/attachment/ID/aaiapisimpledemoopenecomporg.cer
Resolving jira.onap.org (jira.onap.org)... 198.145.29.92
Connecting to jira.onap.org (jira.onap.org)|198.145.29.92|:443... connected.
HTTP request sent, awaiting response... 404
2018-03-29 13:55:44 ERROR 404: (no description).
Have the locations of these scripts changed or do we need to use some other scripts ?
Thanks,
Radhika
I Chen
In case you haven't found the attachments, see Michael's earlier comment.
Sindhu A
Hi
Can we currently use cd.sh to proceed with installation? (Onap on Kubernetes with Rancher)
Thanks
Sindhuri
Michael O'Brien
Yes, all ready to go
rancher + kubernetes = OOM-715 - Getting issue details... STATUS
install OOM/ONAP = OOM-716 - Getting issue details... STATUS
Radhika Kaslikar
Hi All,
Can someone please tell me what is the RAM requirement to deploy dcaegen2 in Amsterdam?
Thanks,
Radhika.
Pedro Barros
Hi all,
Is it possible to download the docker images thru the prepull_docker.sh and then load them to the K8s VM??
Reason: On my openstack network i have the 10001 port blocked.
Is there any alternatives to these pulls??
thanks
Michael O'Brien
The prepull script needs to login to 10001 in order to work - I recommend you work outside your proxy or open the port - you may have other issues later related to chef/git pulls anyway from inside containers.
Pedro Barros
ok, thanks for your advice.
one more question, what option (script(s)) do you recommend to full install ONAP: oom_entrypoint.sh or rancher_setup_1.sh + cd.sh from your git?
Tomasz Kapalka
Hi All
We have tried to run the ONAP with the scripts, but both SDC-BE pods have problems to show up. For both we have following info from kubernetes scheduler:
4/17/2018 2:17:33 PMI0417 12:17:33.279347 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"onap", Name:"dev-sdc-be-config-backend-z92d6", UID:"81704d62-4233-11e8-ad46-024d5e023284", APIVersion:"v1", ResourceVersion:"91938", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' No nodes are available that match all of the predicates: Insufficient pods (1).
4/17/2018 2:17:33 PMI0417 12:17:33.439987 1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"onap", Name:"dev-sdc-be-599585968d-bhxml", UID:"92387e7d-4233-11e8-ad46-024d5e023284", APIVersion:"v1", ResourceVersion:"91980", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' No nodes are available that match all of the predicates: Insufficient pods (1).
Has anyone encounter similar problem?
Roger Maitland
It looks like you've run out of resources. You could add more nodes to your cluster or deploy a subset of the complete ONAP suite. Do customize your deployment edit the kubernetes/onap/values.yaml file and just enable the components you're interested it. There are more complete instructions in the OOM User Guide.
Cheers,
Roger
Tomasz Kapalka
I really doubt that as my VM is 16vCPUs and 104GB of RAM. The usage now is 7vCPUs and around 60GB of RAM. The scheduler would probably give me something like Insufficient CPU or Memory.
Michael O'Brien
Check the branch now - SDC is pass HC
William Kurkian
Hi All,
We are working on doing a deployment of ONAP. We are following the script instructions and downloading the cd.sh file.
However we get this error:
cp: cannot stat 'values.yaml': No such file or directory
I believe this is causing no pods to get setup, and is causing us other issues. I think we need to put this file in our current directory, but I don't see where to get it.
Could someone enlighten us on where to find the values.yaml file ?
Thanks,
William Kurkian
William Kurkian
Oh I feel dumb now. I found it in the official docs.
Pedro Barros
Hi, you can find the values.yaml file inside the oom_entrypoint script
Michael O'Brien
Guys,
Hi, In the current cd.sh I use a values.yaml copy to override the built in oom/kubernetes/onap/values.yaml.
You should be OK with the default values.yaml provided you enable the nexus3 repo.
https://git.onap.org/oom/tree/kubernetes/onap/values.yaml
OOM-710 - Getting issue details... STATUS
See the current copy referenced in the oom_entrypoint.sh script - I only override the nexus3 repo - better to do the following which will be uploaded tonight
testing on
http://jenkins.onap.info/job/oom-cd-master/2765/console
wget https://jira.onap.org/secure/attachment/11414/values.yaml
this is not the right way to go.
We should be passing overrides as a separate -f yaml or a set command on the the helm install.
turn this on
/michael
William Kurkian
Thanks, I have gotten past this part of the setup.
Thanks for your help,
William
Sergiusz Michalski
Hi Guys,
I have a problem with SDC-BE. When I try to enter SDC GUI (SDC-FE) in the SDC backend jetty logs I see the following exception:
2018-04-27T13:28:22.858Z|||||com.att.sdc.23911-SDCforTestDev-v001Client-0|||SDC-BE||||||||ERROR||||dev-sdc-be.onap||c.a.a.d.api.DefaultRequestProcessor||ActivityType=<?>, Desc=<2018-04-27 13:28:22.857 dev-sdc-be-55b786754f-62fwg 1865@dev-sdc-be-55b786754f-62fwg com.att.sdc.23911-SDCforTestDev-v001Client-0-252 null NULL com.att.aft.dme2.api.DefaultRequestProcessor AFT-DME2-0702 No endpoints were registered after trying all route offer search possibilities. Validate that the service has running instances that are properly registering and renewing their endpoint lease. [Context: service=https://dmaap-v1.dev.dmaap.dt.saat.acsi.att.com/events?version=1.0&envContext=TEST&partner=BOT_R&timeout=15000&limit=1;routeOffersTried=MR1:;]>
2018-04-27T13:28:22.858Z|||||com.att.sdc.23911-SDCforTestDev-v001Client-0|||SDC-BE||||||||ERROR||||dev-sdc-be.onap||o.o.s.b.c.d.e.DmaapClientFactory||ActivityType=<?>, Desc=<The exception {} occured upon fetching DMAAP message>
com.att.aft.dme2.api.DME2Exception: [AFT-DME2-0702]: No endpoints were registered after trying all route offer search possibilities. Validate that the service has running instances that are properly registering and renewing their endpoint lease. [Context: service=https://dmaap-v1.dev.dmaap.dt.saat.acsi.att.com/events?version=1.0&envContext=TEST&partner=BOT_R&timeout=15000&limit=1;routeOffersTried=MR1:;]
at com.att.aft.dme2.api.DefaultRequestProcessor.send(DefaultRequestProcessor.java:188)
at com.att.aft.dme2.api.RequestFacade.send(RequestFacade.java:26)
at com.att.aft.dme2.api.DME2Client.send(DME2Client.java:116)
at com.att.aft.dme2.api.DME2Client.sendAndWait(DME2Client.java:136)
at com.att.aft.dme2.api.DME2Client.sendAndWait(DME2Client.java:320)
at com.att.nsa.mr.client.impl.MRConsumerImpl.fetch(MRConsumerImpl.java:130)
at com.att.nsa.mr.client.impl.MRConsumerImpl.fetch(MRConsumerImpl.java:100)
at org.openecomp.sdc.be.components.distribution.engine.DmaapConsumer.lambda$consumeDmaapTopic$1(DmaapConsumer.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
^C
I can't locate the service to which it's pointing:
Context: service=https://dmaap-v1.dev.dmaap.dt.saat.acsi.att.com/events?version=1.0&envContext=TEST&partner=BOT_R&timeout=15000&limit=1;routeOffersTried=MR1:;]
Also I needed to "create" the routeInfo.xml file for this service manually as it was complaining about missing metafile:
path in SDC-BE pod:
/var/lib/jetty/dme2-fs-registry/service=com.att.acsi.saat.dt.dmaap.dev.dmaap-v1/version=1.0/envContext=TEST/routeInfo.xml
I put there some artificial content related to MR1 route info:
<?xml version="1.0" encoding="UTF-8"?>
<routeInfo serviceName="com.att.acsi.saat.dt.dmaap.dev.dmaap-v1" serviceVersion="1.0" envContext="TEST" xmlns="http://aft.att.com/dme2/types">
<dataPartitionKeyPath>/x/y/z</dataPartitionKeyPath>
<dataPartitions>
<dataPartition name="MR1" low="205977" high="205999"/>
</dataPartitions>
<routeGroups>
<routeGroup name="MR1">
<partner>BOT_R</partner>
<route name="MR1">
<dataPartitionRef>MR1</dataPartitionRef>
<stickySelectorKey>MR1</stickySelectorKey>
<routeOffer name="MR1" sequence="1" active="true"/>
</route>
</routeGroup>
</routeGroups>
</routeInfo>
My question is: what's wrong with this service and how to fix this in order not to receive such errors and be able to successfully enter the SDC UI.
Many thanks for your help and quick response(s),
Sergiusz
William Kurkian
Hello,
I am trying to bring up the portal UI with kubernetes. I have an issue with one of the portal containers. It has an InvalidImageName error.
It is this pod: dev-portal-db-7b5dfd476f-hkhfv
I get this error in the pod: Warning InspectFailed 3m (x134 over 53m) kubelet, onap-dev Failed to apply default image tag "nexus3.onap.org:/onap/portal-db:2.1-STAGING-latest": couldn't parse image reference "nexus3.onap.org:/onap/portal-db:2.1-STAGING-latest": invalid reference format
Thanks,
William
Pedro Barros
hi Michael O'Brien,
there was some update to the amsterdam release?? wright now i'm getting these non working pods:
last week only aaf doesn't work.
thanks
Michael O'Brien
Hi, join the discussion and PTL meet monday - what you are experiencing is the nexux3 change friday by the LF. They have completely stopped onap deployment (all branches).
50 snapshot containers are being held until a release is requested - hench the ImagePullBackoff
CIMAN-157 - Getting issue details... STATUS
https://lists.onap.org/pipermail/onap-discuss/2018-April/009336.html
Guys (DevOps) for your reference I requested the docker issue be the first one on the agenda of the 0900 EDT ONAP PTL meeting Monday to devote to this nexus3 devops issue – a quick fix and impact going forward – you are welcome to join.
/michael
https://lists.onap.org/pipermail/onap-discuss/2018-April/009337.html
https://zoom.us/j/283628617
Radhika Kaslikar
Hi All,
I am trying to run the demo for vFWCL.
I was able to launch instances of vFW, vSINK and vPG in openstack and my next step is to run robot heatbridge testcase.
I have created the vnfs using vFWCL as the service type.
But when I run robot heatbridge by using
./demo-k8s.sh heatbridge vFW_SINK_Module 9d16977c-0330-4f00-90e0-7a99a2fc5f23
vFWCL, I am getting error as "Dictionary does not contain key 'vFWCL'".
I saw that /var/opt/OpenECOMP_ETE/robot/assets/service_mappings.py in robot container does not have an entry for vFWCL, but it has entries for vFWSNK, vPKG and vFW.
Do I need to use any latest script? Or do I need to run heatbridge by using vFW or vFWSNK (eg.
./demo-k8s.sh heatbridge vFW_SINK_Module 9d16977c-0330-4f00-90e0-7a99a2fc5f23
vFW)
in place of vFWCL even though I am using vFWCL as service type?I am on Amsterdam release.
Any help would be really appreciated.
Thanks,
Radhika.
Michael O'Brien
a helm namespace delete or a kubectl delete or a helm purge may not remove everything based on hanging PVs - use
William Kurkian
Does anyone know why, after successfully deploying, the kubernetes pods disappear after an hour or two?
When I say disappear, I mean that kubectl get pods -n onap fails to find resources.
William
Roger Maitland
Would you paste the output of:
?
William Kurkian
Hi,
I ran the command, here is the output:
kubectl get pods -all-namespaces
No resources found.
Thanks
Roger Maitland
I suspect you missed a dash (there's a double dash before all-namespaces). The only time I've heard of pods being spontaneously lost is if your hardware is undersized.
William Kurkian
It happened this morning, and I found that the helm release was gone. So I had redo the helm install. We have 128 GB of ram and 16 processors. Do you know what hardware component being undersized could cause this?
Roger Maitland
That's enough for a deployment of ONAP. Where is Kubernetes running, on the same machine? Are you using Rancher to manage K8s?
William Kurkian
Kubernetes is running on that machine, and we are using rancher.
Roger Maitland
Generally best practice is to have them running on separate machines but I wouldn't think that is the problem here. Are you able to re-install and make progress?
William Kurkian
Well this is just a VM on a larger machine, but the VM is assigned 128 GB. I don't think that should cause problems.
Yeah, I can reinstall. I usually end up having them go down some time after reinstalling, I am able to make progress, but I lose time redeploying once or twice a day, which can be pretty time consuming.
Roger Maitland
I know of systems that have been in production for a long time (months) so you shouldn't have to re-install normally. Thanks for not giving up, it's important that we identify these types of issues as you might not be the only one to experience them. Maybe we (I) can add further documentation to avoid the problem if we know what it is.
William Kurkian
Thanks for your help so far. I'll see if it keeps happening, and if I can figure anything else out about it. Definitely we should document it if we can figure it out.
William Kurkian
Hi Roger,
I did an install with all the components, and it is not disappearing like before. I have noticed one issue,
Some pods are stuck in pending with a failedScheduling error. It indicates insufficient pods. General utilization of other resources is very low. I read around and it seems I might need to add a new node to the cluster. I have plenty of resources available, so is this something I could do on the same machine ?
Roger Maitland
Happy to hear that William. Kubernetes has a limit of 110 pods on one node that probably seemed like an impossibly large number but we're hitting this in ONAP now. You can add nodes to your cluster (via Rancher) and the pods will get distributed. Unfortunately some of the projects have made assumptions that business logic and data-bases are co-located which isn't true in a system with multiple nodes so we've seen problems. The OOM team has been working with the other project teams to fix this so it would be great if you would try out a multi-node deployment (I expect you won't see any problems but...).
The configuration section of the OOM User Guide describes how you can customize your deployment to a subset of the total ONAP components. This allows a great deal of flexibility and may allow you to avoid the pod limit if you can work with less than all of the ONAP components.
Gary Wu
I'm observing that k8s is very casual in pulling images. It can often be minutes before it pulls the next one even when there are many more images to pull still, and even if all of the images are available in a cache on the local network. Is there a configuration in k8s or helm where we can speed up the image pulling?
Incidentally this seems to be more pronounced in Beijing; I don't recall observing such a behavior in Amsterdam.
Michael O'Brien
The pulls have a periodic backoff during the lifecycle phases PullBackoff and ImagePull...
However definitely we need to fine tune all the default parameters of the system around resource usage, external access...
Raised the following 2 days ago after our talk - just saw this message Gary.
OOM-1030 - Getting issue details... STATUS
thank you
/michael
William Kurkian
I've managed to deploy with kubernetes and pull up the portal.
I am currently trying to login and having issues. I use the user demo/demo123456! and it seems correct based on the debug logs on the container. However, the login doesn't work. The backend doesn't send back a seesion id.
I notice from the error logs, it states that the session is expired, and there is a nullpointer in the music project, while trying to get a lock from zookeeper.
java.lang.NullPointerException: null
at org.onap.music.lockingservice.MusicLockingService.createLockId(MusicLockingService.java:112)
I am not sure if this is the place to ask about it. I would appreciate any help or direction on a better place to ask.
Thanks,
William
Michael O'Brien
William, thank you for your deployment efforts and triage - very useful
The MUSIC team would be interested in this defect - especially a NPE - you could raise a JIRA (bug against the beijing release)
MUSIC-69 - Getting issue details... STATUS
Here is a recent example - hit the create button
You can also raise this issue on the discuss list onap-discuss under the header MUSIC
see examples on
https://lists.onap.org/pipermail/onap-discuss/2018-May/
/michael
William Kurkian
Great, I will contact the MUSIC team. Thanks for the instructions.
William
Wass Mailing
Hello everyone,
I followed everything for the installation of Onap Amsterdam but i'm stuck on this problem when i execute the cd.sh script :
The portal and others are stuck on the CrashLoopBackOff, i don't know why.
10 pending > 0 at the 78th 15 sec interval
onap-aai aai-service-569577f788-tqhgk 0/1 CrashLoopBackOff 17 10m
onap-aai elasticsearch-6b577bf757-qmmm4 0/1 CrashLoopBackOff 17 10m
onap-message-router dmaap-6b7955b8c6-jk5vd 0/1 CrashLoopBackOff 15 10m
onap-portal portalwidgets-6c5f8944d8-mn4jh 0/1 CrashLoopBackOff 17 10m
onap-portal vnc-portal-6cfc74bc57-vrh2d 0/1 CrashLoopBackOff 18 10m
onap-sdc sdc-cs-6f858cccf9-hmn4p 0/1 Running 0 10m
onap-sdnc dmaap-listener-6f6d54fdbb-8mzgb 0/1 CrashLoopBackOff 20 10m
onap-sdnc sdnc-portal-7c664c5bcd-v4vsd 0/1 CrashLoopBackOff 17 10m
onap-sdnc ueb-listener-7f496cb859-fgpt6 0/1 CrashLoopBackOff 19 10m
onap-vid vid-mariadb-575fd8f48-2zv2k 0/1 CrashLoopBackOff 20 10m
10 pending > 0 at the 79th 15 sec interval
Thanks for your answers !
Michael O'Brien
do either a describe pod or logs -f on one of the pods to see the issue if it appears there
Do you need amsterdam - the branch is essentially abandoned in favor of a pure kubernetes deployment in beijing/master
Pedro Barros
hi all,
is anyone experiencing issues with the consul-agent pod??
Pedro
William Kurkian
Hi All,
We are deploying with 2 VMs on a machine with KVM, and Rancher. We are having issues with sessions in the Portal UI.
However, the Portal team was not able to reproduce our issues. I noticed that the test environment is Openstack based, but we do not use that. Is it possible that using a non Openstack environment could be causing issues with some of the pods, and cause issues with sessions in a spring UI like the Portal ?
Thanks,
William
William Kurkian
Is the VES collector from the DCAE included in this deployment ?
Thanks,
William
William Kurkian
Okay, it is there. I just had some dcae pods that needed to be restarted.
mudassar rana
Hi ,
I am trying to deploy onap using master branch .
I have used oom_rancher_setup script using below url
wget https:
//git.onap.org/logging-analytics/plain/deploy/rancher/oom_rancher_setup.sh for undercloud installation .
But i am facing issue , while running kubectl get pods --all-namespaces
Error → Error from server (Internal Error) : an error on the server ("Service Unavailable").
I cross checked and found that host registration with rancher is failing.
Help me out , with this issue
Michael O'Brien
Depending on your undercloud - the IP discovered by the agent may not be the routable IP - you can override this with the setting -c false and
a routableiphttps://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh
mudassar rana
Michael,
If i use the same IP for Amsterdam branch installation , i don't see any issue.
But , when i use it for master branch installation , i find host registration issue.
Is it still routable ip issue ??
Michael O'Brien
I had an issue (evidently I tested with true on AWS but not false the last time I committed) with the format of the alternative option when -c is false - I have a patch I am trying to get in for the last 3 days
using -c false seems to only be needed on openstack systems - not AWS
If you override the address then use the patch below
https://gerrit.onap.org/r/#/c/54677/
Winnie Tsang (IBM)
Did anyone have problem with onap-clamp? It keep crashing on my environment.
onap-clamp-7d69d4cdd7-ct7b6 1/2 CrashLoopBackOff 24 1h
kubectl logs onap-clamp-7d69d4cdd7-ct7b6 -n onap
Error from server (BadRequest): a container name must be specified for pod onap-clamp-7d69d4cdd7-ct7b6, choose one of: [clamp-filebeat-onap clamp] or one of the init containers: [clamp-readiness]
There are only 2 containers in this pod ( clamp-filebeat-onap and clamp ) but the Init-container clamp-readiness is trying to use a container call clampdb that why it failed. But I'm not sure what should I change here? Should I replace clampdb with clamp?
Best regards,
Winnie
Kiran Kamineni
What do you get when you try kubectl describe po/onap-clamp-7d69d4cdd7-ct7b6 -n onap ? It should tell you which container is not coming up and what is the reason for failure.
Winnie Tsang (IBM)
Hi @Michael O'Brien,
Can cd.sh support Helm that required to use TLS cert?
Best Regards,
Winnie
Nalini Varshney
i have bare metal machine with configuration 337G RAM and 72 CPUs and on this bare metal machine i have created a virtual machine on that bare metal with configuration for 140G RAM and 32 CPUs using vagrant.
after creation of virtual machine i have create Rancher environment using below script
https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh
and followed http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html#quick-start-label this to install onap amsterdam version but faced issues like
kube-system heapster-4285517626-nwf5j 0/1 Evicted 0 4h
kube-system heapster-4285517626-tz2p4 0/1 Pending 0 3h
kube-system kube-dns-638003847-gqttd 3/3 Running 3 4h
kube-system kubernetes-dashboard-716739405-47ngl 1/1 Running 1 4h
kube-system monitoring-grafana-2360823841-v6xgx 0/1 Pending 0 3h
kube-system monitoring-grafana-2360823841-xq03t 0/1 Evicted 0 4h
kube-system monitoring-influxdb-2323019309-15rq5 0/1 Evicted 0 4h
kube-system monitoring-influxdb-2323019309-8fp36 0/1 Pending 0 3h
kube-system tiller-deploy-737598192-kg3z2 0/1 Pending 0 3h
kube-system tiller-deploy-737598192-q8t8q 0/1 Evicted 0 4h
onap config 0/1 Completed 0 3h
onap-aaf aaf-1993711932-9mpz3 0/1 Init:ErrImagePull 0 3h
onap-aaf aaf-cs-1310404376-hcg7c 0/1 Pending 0 11m
onap-aaf aaf-cs-1310404376-mbpl4 0/1 Evicted 0 3h
onap-aai aai-resources-2398553481-b83jt 0/2 Evicted 0 3h
onap-aai aai-resources-2398553481-r0f07 0/2 Pending 0 3h
onap-aai aai-service-749944520-6261d 0/1 Init:ImagePullBackOff 0 3h
onap-aai aai-traversal-2677319478-j3q29 0/2 Evicted 0 3h
onap-aai aai-traversal-2677319478-ss8f8 0/2 Pending 0 3h
onap-aai data-router-3700447603-s7mcm 0/1 Pending 0 2h
onap-aai data-router-3700447603-t352p 0/1 Evicted 0 3h
onap-aai elasticsearch-622738319-sb6z7 0/1 Evicted 0 3h
onap-aai elasticsearch-622738319-vrt1j 0/1 Pending 0 1h
onap-aai hbase-3471984843-5vfpv 0/1 Evicted 0 3h
onap-aai hbase-3471984843-n0528 0/1 Pending 0 2h
onap-aai model-loader-service-911950978-94pdk 0/2 Evicted 0 3h
onap-aai model-loader-service-911950978-9nw1d 0/2 Pending 0 1h
onap-aai search-data-service-2471976899-9dgqv 0/2 Pending 0 2h
onap-aai search-data-service-2471976899-z8q9c 0/2 Evicted 0 3h
onap-aai sparky-be-1779663793-8850k 0/2 Evicted 0 3h
onap-aai sparky-be-1779663793-rxmmt 0/2 Pending 0 2h
onap-appc appc-1828810488-862mm 0/2 Evicted 0 3h
onap-appc appc-1828810488-qfkm9 0/2 Pending 0 24m
onap-appc appc-dbhost-2793739621-05042 0/1 Pending 0 3h
onap-appc appc-dbhost-2793739621-s767p 0/1 Evicted 0 3h
onap-appc appc-dgbuilder-2298093128-k76np 0/1 Init:ImagePullBackOff 0 3h
onap-clamp clamp-2211988013-g4h6g 0/1 Init:ImagePullBackOff 0 3h
onap-clamp clamp-mariadb-1812977665-15hjm 0/1 ErrImagePull 0 3h
onap-cli cli-2960589940-2w66r 0/1 ErrImagePull 0 3h
onap-consul consul-agent-3312409084-k64t3 0/1 Error 0 3h
onap-consul consul-server-1173049560-0mzzc 0/1 Pending 0 3h
onap-consul consul-server-1173049560-2v0gr 0/1 Evicted 0 3h
onap-consul consul-server-1173049560-6pccj 0/1 Pending 0 3h
onap-consul consul-server-1173049560-9z19v 0/1 Evicted 0 3h
onap-consul consul-server-1173049560-d4hsn 0/1 Evicted 0 3h
onap-consul consul-server-1173049560-fdrhx 0/1 Pending 0 3h
onap-dcaegen2 heat-bootstrap-4010086101-l1v9x 0/1 Pending 0 2h
onap-dcaegen2 heat-bootstrap-4010086101-m3jjs 0/1 Evicted 0 3h
onap-dcaegen2 nginx-1230103904-qzdkv 0/1 Evicted 0 3h
onap-dcaegen2 nginx-1230103904-z1xjg 0/1 Pending 0 1h
onap-esr esr-esrgui-1816310556-j7kgq 0/1 ErrImagePull 0 3h
onap-esr esr-esrserver-1044617554-dthp2 0/1 ErrImagePull 0 3h
onap-kube2msb kube2msb-registrator-4293827076-dlpwv 0/1 ErrImagePull 1 3h
onap-log elasticsearch-1942187295-tb6z6 0/1 Init:0/1 0 3h
onap-log kibana-3372627750-kpg49 0/1 Init:ImagePullBackOff 0 3h
onap-log logstash-1708188010-vwpvd 0/1 Init:ImagePullBackOff 0 3h
onap-message-router dmaap-3126594942-h8q6d 0/1 Evicted 0 3h
onap-message-router dmaap-3126594942-zt8tv 0/1 Pending 0 3h
onap-message-router global-kafka-3848542622-cd1nl 0/1 Init:ErrImagePull 0 3h
onap-message-router zookeeper-624700062-h6j79 0/1 Evicted 0 3h
onap-message-router zookeeper-624700062-ktdnz 0/1 Pending 0 3h
onap-msb msb-consul-3334785600-0brtn 0/1 Evicted 0 3h
onap-msb msb-consul-3334785600-fm1xx 0/1 Pending 0 3h
onap-msb msb-discovery-196547432-76n76 0/1 Pending 0 3h
onap-msb msb-discovery-196547432-dqx8j 0/1 Evicted 0 3h
onap-msb msb-eag-1649257109-3xmvs 0/1 Pending 0 3h
onap-msb msb-eag-1649257109-6z01g 0/1 Evicted 0 3h
onap-msb msb-iag-1033096170-cbf26 0/1 Evicted 0 3h
onap-msb msb-iag-1033096170-jls94 0/1 Pending 0 3h
onap-mso mariadb-829081257-mp3jw 0/1 ErrImagePull 1 3h
onap-mso mso-681186204-8x39r 0/2 Evicted 0 3h
onap-mso mso-681186204-l1tsn 0/2 Pending 0 3h
onap-multicloud framework-4234260520-3z8pz 0/1 ErrImagePull 0 3h
onap-multicloud multicloud-ocata-629200416-vsswq 0/1 ErrImagePull 0 3h
onap-multicloud multicloud-vio-1286525177-9hshb 0/1 Evicted 0 3h
onap-multicloud multicloud-vio-1286525177-q9wfc 0/1 Pending 0 20m
onap-multicloud multicloud-windriver-458734915-nsf2d 0/1 ErrImagePull 0 3h
onap-policy brmsgw-2284221413-lckgg 0/1 Evicted 0 3h
onap-policy brmsgw-2284221413-szxm5 0/1 Pending 0 3h
onap-policy drools-534015681-c6lhf 0/2 Evicted 0 3h
onap-policy drools-534015681-sts6b 0/2 Pending 0 3h
onap-policy mariadb-559003789-mrt5f 0/1 ErrImagePull 0 3h
onap-policy nexus-687566637-bp1f2 0/1 ErrImagePull 0 3h
onap-policy pap-4181215123-s0x5p 0/2 Evicted 0 3h
onap-policy pap-4181215123-w4q38 0/2 Pending 0 3h
onap-policy pdp-2622241204-fvctc 0/2 Pending 0 3h
onap-policy pdp-2622241204-kkz83 0/2 Evicted 0 3h
onap-portal portalapps-1783099045-n4l1v 0/2 Evicted 0 3h
onap-portal portalapps-1783099045-q075v 0/2 Pending 0 26m
onap-portal portaldb-1451233177-5kc7v 0/1 Evicted 0 3h
onap-portal portaldb-1451233177-r5wvm 0/1 Pending 0 3h
onap-portal portalwidgets-2060058548-4gvq1 0/1 Init:ErrImagePull 0 3h
onap-portal vnc-portal-1252894321-f691t 0/1 Evicted 0 3h
onap-portal vnc-portal-1252894321-n33dj 0/1 Pending 0 3h
onap-robot robot-2176399604-fx3pl 0/1 Evicted 0 3h
onap-robot robot-2176399604-gt1nl 0/1 Pending 0 3h
onap-sdc sdc-be-2336519847-v2bs5 0/2 Pending 0 1h
onap-sdc sdc-be-2336519847-vkp9q 0/2 Evicted 0 3h
onap-sdc sdc-cs-1151560586-qm2xh 0/1 Init:ImagePullBackOff 0 3h
onap-sdc sdc-es-3319302712-8nr9l 0/1 Pending 0 1h
onap-sdc sdc-es-3319302712-vc9wf 0/1 Evicted 0 3h
onap-sdc sdc-fe-2862673798-27rvw 0/2 Evicted 0 3h
onap-sdc sdc-fe-2862673798-904p6 0/2 Pending 0 2h
onap-sdc sdc-kb-1258596734-4p3vl 0/1 Evicted 0 3h
onap-sdc sdc-kb-1258596734-fgskr 0/1 Pending 0 15m
onap-sdnc dmaap-listener-3967791773-tg10q 0/1 Pending 0 3h
onap-sdnc dmaap-listener-3967791773-vrj1z 0/1 Evicted 0 3h
onap-sdnc sdnc-1507781456-0bjp9 0/2 Pending 0 26m
onap-sdnc sdnc-1507781456-f2lv5 0/2 Evicted 0 3h
onap-sdnc sdnc-dbhost-3029711096-mpspz 0/1 Pending 0 3h
onap-sdnc sdnc-dbhost-3029711096-mr0sv 0/1 Evicted 0 3h
onap-sdnc sdnc-dgbuilder-4011443503-8mngl 0/1 Evicted 0 3h
onap-sdnc sdnc-dgbuilder-4011443503-n8nk3 0/1 Pending 0 3h
onap-sdnc sdnc-portal-516977107-5vz53 0/1 Evicted 0 3h
onap-sdnc sdnc-portal-516977107-dwn81 0/1 Pending 0 3h
onap-sdnc ueb-listener-1749146577-clkdk 0/1 Pending 0 3h
onap-sdnc ueb-listener-1749146577-n5kk5 0/1 Evicted 0 3h
onap-uui uui-4267149477-clghd 0/1 ErrImagePull 0 3h
onap-uui uui-server-3441797946-k5qr3 0/1 ErrImagePull 0 3h
onap-vfc vfc-catalog-840807183-q8xjx 0/1 ErrImagePull 0 3h
onap-vfc vfc-emsdriver-2936953408-3h71c 0/1 Pending 0 4m
onap-vfc vfc-emsdriver-2936953408-rqq7f 0/1 Evicted 0 3h
onap-vfc vfc-gvnfmdriver-2866216209-3n1vd 0/1 Evicted 0 3h
onap-vfc vfc-gvnfmdriver-2866216209-fv07l 0/1 Pending 0 9m
onap-vfc vfc-hwvnfmdriver-2588350680-4ls0l 0/1 Pending 0 5m
onap-vfc vfc-hwvnfmdriver-2588350680-9zr1m 0/1 Evicted 0 3h
onap-vfc vfc-jujudriver-406795794-4q57r 0/1 ErrImagePull 0 3h
onap-vfc vfc-nokiavnfmdriver-1760240499-0jq3l 0/1 ErrImagePull 0 3h
onap-vfc vfc-nslcm-3756650867-hp3kz 0/1 ErrImagePull 0 3h
onap-vfc vfc-resmgr-1409642779-452bh 0/1 ErrImagePull 0 3h
onap-vfc vfc-vnflcm-3340104471-t6lxk 0/1 ErrImagePull 0 3h
onap-vfc vfc-vnfmgr-2823857741-nkvm9 0/1 ErrImagePull 0 3h
onap-vfc vfc-vnfres-1792029715-7b2l0 0/1 ErrImagePull 0 3h
onap-vfc vfc-workflow-3450325534-s61kw 0/1 ErrImagePull 0 3h
onap-vfc vfc-workflowengineactiviti-4110617986-kg3gn 0/1 ErrImagePull 0 3h
onap-vfc vfc-ztesdncdriver-1452986549-9ztr7 0/1 Evicted 0 3h
onap-vfc vfc-ztesdncdriver-1452986549-stgvt 0/1 Pending 0 3m
onap-vfc vfc-ztevmanagerdriver-2080553526-9tgvv 0/1 ErrImagePull 0 3h
onap-vid vid-mariadb-3318685446-4qqnx 0/1 Pending 0 3h
onap-vid vid-mariadb-3318685446-hxjdt 0/1 Evicted 0 3h
onap-vid vid-server-421936131-4vjwk 0/2 Evicted 0 3h
onap-vid vid-server-421936131-gwb34 0/2 Pending 0 3h
onap-vnfsdk postgres-436836560-fsnc0 0/1 ImagePullBackOff 0 3h
onap-vnfsdk refrepo-1924147637-l3n8l 0/1 Init:ImagePullBackOff 0 3h
Roger Maitland
What version of ONAP are you trying to work with? You're using the instructions from 'latest' (basically Beijing at this point) but you say that you're trying to install the Amsterdam release.
Nalini Varshney
HI Roger Maitland thanks for the respond .
my requirement is Amsterdam version of ONAP. please help me out and guide me for the same and provide me the way and steps, how i can install Amsterdam version of ONAP
Roger Maitland
Here is a link to the Amsterdam ONAP Operations Manager documentation - specifically the 'OneClick' deployment: https://onap.readthedocs.io/en/amsterdam/submodules/oom.git/docs/OOM%20User%20Guide/oom_user_guide.html?#onap-oneclick-deployment-walk-though
Be sure to clone the amsterdam version of OOM when using these instructions.
Note that the ONAP development community has been working on the beijing release during 2018.
Winnie Tsang (IBM)
What is the CPU usage on your worker node? Does your worker node constantly go to "Not Ready" State? I recommend you to increase your vCPU on your VM to 48 or 64.
PS For Amsterdam, you can run it on a all-in-one node k8s. But for Beijing(master), you need to have at least 2 worker nodes to fit all of the pods created from ONAP
Nalini Varshney
hi Winnie Tsang (IBM) Thanks for respond. my VM have 32 CPUs and i tried to install Amsterdam version of ONAP. please guide me for the same.
my requirement is Amsterdam version of ONAP. provide me the way and steps, how i can install Amsterdam version of ONAP
Winnie Tsang (IBM)
Let's check why there are pods in Evicted and Pending state. You can get that by issue "kubectl describe pod <pod-name> -n <namespace>" It should tell you the reason. I suspect is some resource (storage, cpu, etc) in your environment is not sufficient enough so it cause the pod be evicted from the node and the scheduler can't find a node to assign the pod on.
Manoj Nair
Hi ,
I tried to install ONAP Beijing using OOM/Rancher/Kubernetes/Helm and noticed some of the pods are pending in "CrashLoopBackOff" state or some in Running state with Ready - 0/1). Before confirming this to be an issue with the installation or component configuration, I would like to know if there is a way I can selectively restart the pods and pull images afresh. I could see the command "helm del dev --purge" , but this deletes all the pods and takes too much time.
Nalini Varshney
Hi,
I tried to install ONAP Beijing using OOM and my system configuration is 350G RAM, 72 CPUs, 3T Disk.
i tried to install ONAP Beijing service one by one using "helm install locall/service_name". i successfully installed 22 services but after installation of 22 services i continued to install other services but it stuck in pending state
wobbly-peahen-aaf-cm-89f97c566-k9gjs 0/1 Pending 0 1h
wobbly-peahen-aaf-create-config-xdc4m 0/1 Pending 0 1h
wobbly-peahen-aaf-cs-866cdc88d4-4l65x 0/1 Pending 0 1h
wobbly-peahen-aaf-fs-84d68bff57-lwqzt 0/1 Pending 0 1h
wobbly-peahen-aaf-gui-5b7dc87599-rcf8w 0/1 Pending 0 1h
wobbly-peahen-aaf-hello-78499cd7c6-w9vwc 0/1 Pending 0 1h
wobbly-peahen-aaf-locate-7d648868f4-5vsb4 0/1 Pending 0 1h
wobbly-peahen-aaf-oauth-769895f9d9-mbmsz 0/1 Pending 0 1h
wobbly-peahen-aaf-service-c4d74d86f-rwtjg 0/1 Pending 0 1h
wobbly-peahen-aaf-sms-676c75759f-lm8p5 0/1 Pending 0 1h
wobbly-peahen-aaf-sms-quorumclient-0 0/1 Pending 0 1h
wobbly-peahen-aaf-sms-vault-0 0/2 Pending 0 1h
i described the pod and found
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 21s (x226 over 1h) default-scheduler No nodes are available that match all of the predicates: Insufficient pods (1).
i don't understand what can i do to resolve this problem. and where is the logs location
please guide me how to debug these errors.
Michael O'Brien
There is a default pod limit of 110 - try deploying on a cluster with at least 2 nodes
Nalini Varshney
Thanks Michael O'Brien,
Is any script (rancher server) available to create a cluster with multiple nodes. previously i create a cluster using oom_rancher_setup.sh script.
please guide me how can i create another node with existing cluster.
Xiaobo Chen
Hi,
I am now struggling to deploy ONAP Beijing via OOM with 2 nodes. But I found there was CrashLoopBackOff with aaf and clamp. I tried to re-deploy many times, but this problem still existed. I noticed the previous aai-champ issue which I met before was resolved by "Merge "Fix aai-champ service" into beijing Borislav Glozman". So I believe that there should be some fixes in aaf and clamp. Could anybody look at this issue?
onap@onap-PowerEdge-R730:~$ kubectl get pods -n onap -o wide|grep 0/
dev-aaf-cm-f4dc85d89-qzrqv 0/1 Init:1/2 2 22h 10.42.94.101 onap-poweredge-r730
dev-aaf-fs-9b76f984-q6tv6 0/1 Init:1/2 2 22h 10.42.81.245 onap-poweredge-r730
dev-aaf-gui-9596b58c8-62rrr 0/1 Init:1/2 14 22h 10.42.148.54 onap-poweredge-r730-2
dev-aaf-hello-77b4fdb4b7-9qqjt 0/1 Init:1/2 2 22h 10.42.163.97 onap-poweredge-r730
dev-aaf-locate-57c96f8bb9-spn7r 0/1 CrashLoopBackOff 204 22h 10.42.213.96 onap-poweredge-r730-2
dev-aaf-oauth-84f5b88468-228cn 0/1 Init:1/2 2 22h 10.42.95.154 onap-poweredge-r730
dev-aaf-service-655664c7cb-5g8sb 0/1 Init:1/2 14 22h 10.42.115.143 onap-poweredge-r730-2
dev-clamp-dash-es-57bc9dc595-89f5f 0/1 CrashLoopBackOff 384 22h 10.42.148.176 onap-poweredge-r730-2
dev-clamp-dash-kibana-5d6fbb6d7-hkc5m 0/1 Init:CrashLoopBackOff 112 22h 10.42.1.38 onap-poweredge-r730
Xiaobo Chen
The clamp was up after restart hundred times. But aaf still in CrashLoopBackOff status.
onap@onap-PowerEdge-R730:~$ kubectl get pods -n onap|grep clamp
dev-clamp-56444db86d-8rph6 2/2 Running 174 2d
dev-clamp-dash-es-57bc9dc595-89f5f 1/1 Running 778 2d
dev-clamp-dash-kibana-5d6fbb6d7-hkc5m 1/1 Running 0 2d
dev-clamp-dash-logstash-6dc6874d68-lkt8m 1/1 Running 0 2d
dev-clampdb-7f579686d-skxpk 1/1 Running 0 2d
onap@onap-PowerEdge-R730:~$ kubectl get pods -n onap|grep aaf
dev-aaf-cm-f4dc85d89-qzrqv 0/1 Init:1/2 2 2d
dev-aaf-cs-7b7648974c-sbxgs 1/1 Running 0 2d
dev-aaf-fs-9b76f984-q6tv6 0/1 Init:1/2 2 2d
dev-aaf-gui-9596b58c8-62rrr 0/1 Init:1/2 14 2d
dev-aaf-hello-77b4fdb4b7-9qqjt 0/1 Init:1/2 2 2d
dev-aaf-locate-57c96f8bb9-spn7r 0/1 CrashLoopBackOff 569 2d
dev-aaf-oauth-84f5b88468-228cn 0/1 Init:1/2 2 2d
dev-aaf-service-655664c7cb-5g8sb 0/1 Init:1/2 14 2d
Vidhu Shekhar Pandey
Can you describe the pod to see if it is liveness or readiness issue?
I faced similar CrashLoopBackOff for dev-aaf-oauth and dev-aaf-service yesterday. Increasing the liveness an readiness delays in following file solved the problem
oom/kubernetes/aaf/values.yaml
liveness:
initialDelaySeconds: 180
periodSeconds: 10
# necessary to disable liveness probe when setting breakpoints
# in debugger so K8s doesn't restart unresponsive container
enabled: true
readiness:
initialDelaySeconds: 60
periodSeconds: 10
Xiaobo Chen
Hi,
I listed the description of crashLoopBackOff pod dev-aaf-locate-57c96f8bb9-spn7r below. I am not sure if it was related to liveness or readiness, but there was no liveness or readiness events from the log. I will try to increasing the liveness and readiness delays as you provided. Hopefully it can solve this issue.
Thanks for your help!
onap@onap-PowerEdge-R730:~/setup/oom/kubernetes/aaf$ kubectl describe pod/dev-aaf-locate-57c96f8bb9-spn7r -n onap
Name: dev-aaf-locate-57c96f8bb9-spn7r
Namespace: onap
Node: onap-poweredge-r730-2/192.168.1.4
Start Time: Mon, 02 Jul 2018 12:09:29 +0800
Labels: app=aaf-locate
pod-template-hash=1375294665
release=dev
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"onap","name":"dev-aaf-locate-57c96f8bb9","uid":"a4232883-7dad-11e8-8077-025c689d2...
Status: Running
IP: 10.42.213.96
Created By: ReplicaSet/dev-aaf-locate-57c96f8bb9
Controlled By: ReplicaSet/dev-aaf-locate-57c96f8bb9
Init Containers:
aaf-locate-job-complete:
Container ID: docker://a92beac18a80c74c41166a84ee0530538e6410a1d9a7691221e66fc31037573a
Image: oomk8s/readiness-check:2.0.0
Image ID: docker-pullable://oomk8s/readiness-check@sha256:7daa08b81954360a1111d03364febcb3dcfeb723bcc12ce3eb3ed3e53f2323ed
Port: <none>
Command:
/root/job_complete.py
Args:
-j
dev-aaf-create-config
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 02 Jul 2018 17:19:04 +0800
Finished: Mon, 02 Jul 2018 17:26:16 +0800
Ready: True
Restart Count: 14
Environment:
NAMESPACE: onap (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-js5jz (ro)
aaf-locate-readiness:
Container ID: docker://5595cfbe612b991223cdfb4110e09b7dce2bfcb8cd1d434864c2fc08379e8f37
Image: oomk8s/readiness-check:2.0.0
Image ID: docker-pullable://oomk8s/readiness-check@sha256:7daa08b81954360a1111d03364febcb3dcfeb723bcc12ce3eb3ed3e53f2323ed
Port: <none>
Command:
/root/ready.py
Args:
--container-name
aaf-cs
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 02 Jul 2018 17:26:21 +0800
Finished: Mon, 02 Jul 2018 17:26:22 +0800
Ready: True
Restart Count: 0
Environment:
NAMESPACE: onap (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-js5jz (ro)
Containers:
aaf-locate:
Container ID: docker://be180ca3c21fbb7d890138db2e97dc5d7a60c12b0fd9162856cbea458a8b892f
Image: nexus3.onap.org:10001/onap/aaf/aaf_locate:2.1.1
Image ID: docker-pullable://nexus3.onap.org:10001/onap/aaf/aaf_locate@sha256:9ed5cecf24c692c72182ac4e804ffc37d9bb0018c981ae675369b5585b0fe954
Port: <none>
Command:
/bin/bash
-c
ln -s /opt/app/osaaf/data /data;/opt/app/aaf/locate/bin/locate
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 06 Jul 2018 11:26:13 +0800
Finished: Fri, 06 Jul 2018 11:26:14 +0800
Ready: False
Restart Count: 1039
Liveness: tcp-socket :8095 delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: tcp-socket :8095 delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
CASSANDRA_CLUSTER: cassandra_container
Mounts:
/etc/localtime from localtime (ro)
/opt/app/osaaf from aaf-persistent-vol (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-js5jz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
localtime:
Type: HostPath (bare host directory volume)
Path: /etc/localtime
aaf-persistent-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: dev-aaf-pvc
ReadOnly: false
default-token-js5jz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-js5jz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 43m (x1044 over 3d) kubelet, onap-poweredge-r730-2 pulling image "nexus3.onap.org:10001/onap/aaf/aaf_locate:2.1.1"
Warning BackOff 14m (x23652 over 3d) kubelet, onap-poweredge-r730-2 Back-off restarting failed container
Warning FailedSync 3m (x23716 over 3d) kubelet, onap-poweredge-r730-2 Error syncing pod
Borislav Glozman
Hi,
I experienced this kind of problems when there was no nfs between my k8s nodes at /dockerdata-nfs.
Kiran Kamineni
Is there any plan to move to Kubernetes 1.9 for the next release? 1.9 has some features that allow for automatic sidecar injection when using istio.
Michael O'Brien
Yes would like to
OOM-1133 - Getting issue details... STATUS
We are currently limited by our version of Rancher 1.6.18
There is rancher 2.0 in the queue as well.
LOG-327 - Getting issue details... STATUS
You are welcome to join the effort testing these.
Nalini Varshney
Hi Team,
I created a Kubernetes environment with 1 rancher server and 3 worker node
Rancher server configuration is : 16G RAM, 16 CPU, 160 HDD
On Rancher Server
Worker Node Configuration is : 32G RAM, 16 CPU, 320 HDD each
On Worker Node
i followed below link for the installation
https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html#onap-on-kubernetes-with-rancher
when cluster is created successfully. i started to setting up ONAP of the existing environment using below link
https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html#quick-start-label
when i ran this command "helm install local/onap -n dev --namespace onap"
onap pod creation process is scheduled on one worker node only and after creation of 66 pods the machine is disconnected or hang.
i want to know why all pods creation is schedule on one worker node. and when it will be schedule on other worker node.
on Rancher dashboard, this worker state is disconnected how i can resolve this problem.
i need urgent help. please guide me
Borislav Glozman
I use environment with 6 32Gb RAM nodes to run ONAP Beijing release. Less that that is unstable for me.
It is not clear why one worker node only is used. Please check whether the other nodes are connected in Hosts section of Rancher UI.
Nalini Varshney
Hi Borislav Glozman thank you
Yes all worker node are connected and in active mode. I dont know why all process scheduled at only one worker node.
please help me to debug the problem. and provide me any other way if possible.
please find attached image for the reference.that is the current stage of my environment.
thank you
Borislav Glozman
You should increase the number of nodes to 6 and retry.
Nalini Varshney
Thanks Borislav Glozman but My question is why all the process was scheduled only on one worker node when i have 3 worker node.
when i will create a environment with 6 node this problem is automatically resolved ?
Borislav Glozman
Hopefully.
Make sure you clean-up the environment before you retry.
Nalini Varshney
Hi Borislav Glozman I configured my environment with 6 nodes and all the nodes are active in Rancher UI. but when i run the kubctl get nodes command on terminal it only displayed only one worker node.
please guide my what is the problem with my set up. and how i can resolved these issues.
please find attached image for the reference.
and also wants to know kube-system services are running only one worker node and shared with other worker node is it ok ?
Borislav Glozman
You need to have 6 nodes in "kubectl get nodes" output. Something in your env is still not working.
Also, take a look at Kubernetes → Infrastructure Stacks. Maybe there are errors.
I would suggest to delete the k8s env in rancher UI and create a new one, adding all 6 hosts.
Also, make sure the hosts do not include rancher VM.
mudassar rana
Hi Guys,
I am planning to install onap Beijing release using approach onap on kubernetes. I don't want to use heat template for onap installation.
I have few queries
1> For kubernetes environment , i don't want to use rancher approach. But then how can i update kubectl config . In OOM User giude , its mentioned to paste kubectl config from Rancher
2> I have a high end server of 256 GB Ram . So Do i still require multiple node setup
3> If i go ahead with Rancher , Can i install Onap directly on server or Do i have to install Openstack Ocata and create instance on it.
user-55883
Hi mudassar rana,
did you get any answer to question number 3? I am also planning to install ONAP Beijing release on Kubernetes but wondering if I still need to install openstack first or not.
Thanks.
Roger Maitland
1> The official documentation is for Rancher however there is a wiki page with alternatives (ONAP on Alternative Environments).
2> With OOM one can deploy parts or all of ONAP so it's hard to forecast if one server is sufficient.
3> OpenStack is required if you're going to instantiate any VNFs but not for ONAP itself.
Cheers,
Roger
Chenglong Wang
Hi,
i'm tryting to instal onap beijing with oom. The message-router pod failed with 3904 port connection refused, can anyone suggest on these?
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m default-scheduler Successfully assigned dmaap-dev-message-router-68b49998bf-h7w4x to k8s-2
Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "cadi"
Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "appprops"
Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "mykey"
Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "default-token-rtdjb"
Normal SuccessfulMountVolume 9m kubelet, k8s-2 MountVolume.SetUp succeeded for volume "localtime"
Normal Pulled 9m kubelet, k8s-2 Container image "oomk8s/readiness-check:2.0.0" already present on machine
Normal Started 9m kubelet, k8s-2 Started container
Normal Created 9m kubelet, k8s-2 Created container
Warning Unhealthy 7m (x6 over 8m) kubelet, k8s-2 Readiness probe failed: dial tcp 10.42.229.95:3904: getsockopt: connection refused
Normal Pulled 7m (x2 over 8m) kubelet, k8s-2 Container image "nexus3.onap.org:10001/onap/dmaap/dmaap-mr:1.1.4" already present on machine
Normal Killing 7m kubelet, k8s-2 Killing container with id docker://message-router:Container failed liveness probe.. Container will be killed and recreated.
Normal Created 7m (x2 over 8m) kubelet, k8s-2 Created container
Normal Started 7m (x2 over 8m) kubelet, k8s-2 Started container
Warning Unhealthy 4m (x11 over 8m) kubelet, k8s-2 Liveness probe failed: dial tcp 10.42.229.95:3904: getsockopt: connection refused
Nalini Varshney
Hi,
i installed ONAP Beijing Release using OOM and i want to check installed service list and state of the service. how can i check the active service list and how they services are interact to each other.
how can i verify all the services are installed successfully.
Is there any document available related to this?
Kumar Lakshman Kumar
Hi Nalini,
on your machine you can execute these commands
kubectl get pods --all-namespaces → This will shows if all components pods are running up
kubectl get services --all-namespaces → this will show services IPs and the ports exposed by service and what ports mapped to host machines
/oom/kubernetes/robot/ete-k8s.sh health → you can execute this command to check the basic validation of ONAP components working.
Nalini Varshney
Hi Kumar Lakshman Kumar,
Thank you for Response.
I ran all these commands.
when i ran "kubectl get services --all-namespaces" i found some pods are error state and some are terminated but all these are different service.
so i want to check the status of all the services which are active.and how these services are communicate to each other.
if any document s and links are available related to this please share with me.
Kumar Lakshman Kumar
Hi Nalini,
All components interact with each other using there Internal APIs, if you want to know a state of particular component you can run the health check `/oom/kubernetes/robot/ete-k8s.sh health` if it result looks good for components you can use that.
ONAP have many components and If I was in your place I will analyze my use-case and only look at the components of my interest, developer wiki is a nice place to start to understand the component.
Beijing documentation present at http://onap.readthedocs.io/en/beijing/release/index.html#documentation
Roger Maitland
The OOM User Guide is located here: https://docs.onap.org/en/beijing/submodules/oom.git/docs/oom_user_guide.html
Lukasz Rajewski
Hi,
is cd.sh onap installation script still valid or it shouldn't be used anymore for Beijing release?
Nalini Varshney
Hi Lukasz Rajewski,
ONAP Beijing installation procedure is different you can follow below link
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_cloud_setup_guide.html#cloud-setup-guide-label
Narayanan Krishnamurthy
Hi Michael O'Brien and All, I am running Kubernetes in AWS clustered environment 1 master + 4 nodes. As some portals are running at different nodes. So as per readthedocs, I guess I should create separate elastic IP for each nodes so that it wont change on restarts. Also then I have to configure them in my local hostfile to map to different URLs eg.,
xx.85.74.60 portal.api.simpledemo.onap.org
xx.205.233.223 vid.api.simpledemo.onap.org
xx.224.215.184 sdc.api.fe.simpledemo.onap.org
xx.205.233.223 portal-sdk.simpledemo.onap.org
xx.229.138.155 policy.api.simpledemo.onap.org
xx.224.215.184 aai.api.sparky.simpledemo.onap.org
xx.224.215.184 cli.api.simpledemo.onap.org
As elastic ip in AWS is limited, how is you cluster demo handling this? Please note we are not configuring route 53 DNS, for just try out we will stick to elastic ip for accessing portals.
Thanks in advance,
Michael O'Brien
kubernetes takes care of this for you via service routing using the dns service inside k8s.
you only need an EIP for the master - so that rancher host registration does not change.
For the cluster nodes - these can keep the etherial ip's as long as the vms are not rebooted.
I use route53 only so I register a domain name not an ip in rancher - and to keep a github oauth up during cluster rebuilds - to keep the crypto miners off the 10250 port.
Ideally you access all the above GUI's (aai, policy etc) from portal - which is different - it is using a non-ELB rancher supplied load balancer implementation - so runs from any VM host.
Narayanan Krishnamurthy
Thanks a lot, that clarifies. While running OOM scripts I have server parameter with Private Domain Name, so that it wont change in reboot. I noticed something strange while restarting VMs, Portal users other than Demo was missing. Is this a known issue?
Michael O'Brien
Heads up we are upgrading the RI and CD systems from Rancher 1.6.14 and Helm 2.8.2 to Rancher 1.6.18 and Helm 2.9.1
https://lists.onap.org/g/onap-discuss/topic/oom_integration_helm/24628483?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,24628483
user-55883
Hello Folks,
I am new to ONAP and I am planning to install the ONAP Beijing on Kubernetes. I am currently reading this section "ONAP OOM Beijing - Hosting docker images locally" and I have a question.
From this document https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html#onap-on-kubernetes-with-rancher it looks like openstack is needed to install racher and kubernetes. My questions are:
Appreciate all your response.
thank you.
Michael O'Brien
Onap will come up on one or more ubuntu 16 metal/vms regardless of the undercloud (aws, Azure, gcd, openstack...) as long as you use rancher to get a default LoadBalancer. You only need openstack to point SO at to instantiate VMs currently, until the cloud plugins are finished.
There are 5 installation scripts around, one oif the early ones is in my logging-analytics repo under deploy - still running helm install instead of the helm plugin. My CD system runs on AWS for example and my dev system on an ubuntu VM on VMWare on my laptop (no openstack)
user-55883
Thank you for the response Michael.
So if I can get one server that supports the below hardware requirements (+ install Ubuntu 16), then that should be enough to bring up ONAP OOM Beijing on Kubernetes, correct?
OOM Hardware Requirements:
RAM: 128GB
HD: 160GB
vCores: 32
thanks in advance.
Vijendra Rajput
Any Plan to use AWS native EKS service for ONAP deployment.
Interested in that project if already running or else we can start.
Roger Maitland
OOM is based on K8s and Helm which enables EKS and other K8s native based installs; however, I don't know of anyone actually doing this yet. Have you tried this?
Michael O'Brien
I have a cluster up - it runs a minimum of $5/day without the EC2 costs - attempting to get a deployment going via https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html
LOG-554 - Getting issue details... STATUS
Zhihua Ye
After i deployed Helm Server/Tiller, i made tiller authentication but patched failed,
# sudo kubectl create serviceaccount --namespace kube-system tiller
serviceaccount "tiller" created
# sudo kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
clusterrolebinding "tiller-cluster-rule" created
# sudo kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
deployment "tiller-deploy" not patched
and i check the deployment info: sudo kubectl --namespace=kube-system edit deployment/tiller-deploy
it shows:
schedulerName: default-scheduler
securityContext: {}
serviceAccount: io-rancher-system
serviceAccountName: io-rancher-system
terminationGracePeriodSeconds: 30
should i change the serviceAccount and serviceAccountName?
Michal Mazurek
Hi all, I've tried to install ONAP via OOM (disabled dcaegen2 since I don't want to use DCAE) using https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_quickstart_guide.html. But most of the pods either are either in BrashLoopBackOff/ContainerCreating/Error/ErrImagePull/Init etc states. I have increased liveliness/readiness in for aaf as stated here in comments. Any caveats on how to make it work? Is rancher mandatory to deploy?
My environment is VMware ESXi + KubernetesAnywhere (proper kubernetes and helm version in place). I have 4 nodes + 1 master, each 8 vCPUs & 32GB ram.
Michael O'Brien
You don't need rancher - but it is our reference implementation (used to bootstrap kubernetes and help setup the cluster - it also provides a default loadbalancer implementation - so you don't have to configure for example AWS ELB/ALB)
If you use something else like kubeadm - any issues you have - you are kind of on your own - hence why we are standardizing on one implementation to minimize undercloud variances.
If more than a couple (less than 5) have imagepull errors - either nexus3 is hammered (not likely usually) - or you are having issues inside your network pulling from nexus3
Michal Mazurek
I am rather sure that cluster works just fine. Could it be caused by the fact that for networking I am using flannel?
Michael O'Brien
It is hard to tell from the docs if the rancher-cni implementation is only for cattle or kubernetes in our case. I don't think they use flannel for CNI - but I am not 100% sure yet until I verify it
79e4e19bf040 rancher/net:v0.13.17 "/rancher-entrypoi..." 2 weeks ago Up 2 weeks r-ipsec-cni-driver-1-e8bee4f4
https://rancher.com/docs/rancher/v1.6/en/rancher-services/networking/#
Michael O'Brien
clipboard for https://lists.onap.org/g/onap-discuss/topic/portal_problem_with/28285637?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28285637
user-37356
I am not sure if here is the right place to ask my question. Please advise if it is not ...
I am working on ONAP deployment options and I am investigating the ONAP OOM option with Kubernetes on Openstack.
I installed a few ONAP Beijing components with chart version 2.0.0 (such as SO, SDC, Portal, …) on an Openstack cloud by installing a Kubernetes cluster with Rancher in multiple hosts to manage the entire life-cycle of my ONAP installation.
The setup was completed and most of the pods were running with no issues.
My question is now on upgrading this light instance of ONAP from Beijing to Casablanca without losing any data or causing any down time. I figured out that different infrastructure versions of Hlem/Kubernetes/Rancher/kubectl are supported on Beijing and Casablanca releases. So, I managed (not an easy task!) to upgrade them accordingly, but some of the installed ONAP Beijing containers went down. I also added repos locally for Casablanca with chart version 3.0.0. However, my attempts on the platform upgrade have not been successful so far (neither from yaml file nor component-by-component).
All the examples and guides in Wiki are for minor upgrades in same release and I could not find anywhere mentioning on how to upgrade/rollback between major releases (e.g., Beijing-Casablanca). Is this supported and tested? If so, could you please give me the guideline links?
I even installed tried with single components (e.g., SO) in Beijing and tried to upgrade it to Casablanca. But it is not successful. Any idea or reason?
Roger Maitland
Major upgrades are still a work in progress. The OOM team is working on some capabilities to assist in upgrades but this will not but sufficient in itself - there will always be work required from the project teams to make major upgrades seamless. One of the more challenging aspects is dealing with schema changes across versions which require migration scripts and versioned APIs. You may want to follow: OOM-9 - Getting issue details... STATUS
Cheers, Roger
Michal Mazurek
Hi guys,
I have deployed k8s cluster according to: https://onap.readthedocs.io/en/casablanca/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html#onap-on-kubernetes-with-rancher (except I didn't update kubeconfig on k8s nodes since I run kubectl from my machine locally, all scripts were copy pasted, also plugins for helm i.e. deploy/undeploy had its owner changed from root to user from which I ran commands - otherwise these were failing with permission denied)
Now trying to deploy onap on it using: https://onap.readthedocs.io/en/casablanca/submodules/oom.git/docs/oom_quickstart_guide.html (+ I ran helm init to start tiller on k8s)
Do I need a working openstack (to which onap is supposed to connect to) to make that deployment successful?
Versions of the components (as read from client machine that is used to deploy)):
helm version
kubectl version
While trying to deploy I am getting (even though it shows "deployed", no pods are created at all on k8s cluster):
Michael O'Brien
You don't need a working openstack unless you plan to deploy VNFs - if that is the case rework your SO config - see dev.yaml and restart your pods
https://git.onap.org/oom/tree/kubernetes/onap/resources/environments/dev.yaml#n122
I have the following - I run my script fine using sudo
https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh
Note that below versions are for rancher 1.6.25 in queue for LOG-895 - Getting issue details... STATUS
- but your timeouts during deployment are not good - they look random - are you running a fast enough master VM (vCore, ram, network) - looks like your cluster may be saturated - check top on the vm.
Michal Mazurek
True there is something wrong with resouces - I can see it now on host level.
user-764e9
Hi,
Could anyone help me to know which version is compatible to install Openstack as a infra for OOM installation ? is Ocata is still Ok or we should go higher version like pike, Beijing i installed on Ocata.
Michael O'Brien
I think we are still good with Ocata - as the Windriver openlab is unchanged since Beijing I think
http://10.12.25.2/auth/login/?next=/project/instances/
TIAN LUO
Encountered some errors when undelpoy the onap?
[root@localhost ~]# helm undeploy dev –purge
release "dev-vnfsdk" deleted
Error: invalid release name, must match regex ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])+$ and the length must not longer than 53
release "dev-vid" deleted
Error: invalid release name, must match regex ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])+$ and the length must not longer than 53
release "dev-vfc" deleted
Error: invalid release name, must match regex ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])+$ and the length must not longer than 53
release "dev-uui" deleted
...
Manish Kumar
Hi TIAN LUO,
I tried steps written in below link and it worked for me.
K8S / helm basic commands for ONAP integration
Michael O'Brien
Post the yaml you used and your deploy command and helm undeploy in a Jira - we can address it
I have seen this restriction in the past - just can't remember the context
create in OOM under Jira - hit the create button above the following - we can track things then
OOM-1560 - Getting issue details... STATUS
Syed Atif Husain
Hi,
I have deployed Casablanca (few components only) using OOM on K8S on Azure VMs.
Setup has 1 Rancher VM, 1 master VM and 5 slave VMs.
Deployed components are:
aaf, aai, appc, clamp, cli, esr, log, oof, msb, nbi, policy, portal, robot, sdc, sdnc, so, uui, vfc, vid, vnfsdk
Issue is – many pods are failing with liveness error and I am unable to access portal UI although all portal pods are ‘Running’.
Due to retry of failing pods, the box space gets all used up and after a day I cannot even access the rancher UI to debug.
Kindly advise. Details of some pod issues are below.
I was able to bring up portal-casablanca (from Init:Error) by increasing liveness:DelaySeconds from 10 to 180 as per some a suggestion on wiki.
But this solution hasn’t worked for other pods failing with same liveness error (aaf, oof pods are still failing with liveness error)
aai-data-router is failing with “Readiness probe failed: dial tcp XXXXXX:9502: connect: connection refused”.
Many pods are ‘Running’ with their older version pods in ‘Terminating’ state. [I assumed Terminating ones should get removed but they haven’t]
e.g.
onap dev-portal-portal-app-6f57b79969-28mtj 1/2 Terminating 6 3h 10.42.110.56 onap-vm5 <none>
onap dev-portal-portal-app-6f57b79969-5glxj 2/2 Running 0 1h 10.42.149.141 onap-vm4 <none>
ERROR PODS
onap-rancher@onap-rancher:~/oom/kubernetes$ kubectl get pods --all-namespaces -o=wide | grep Error
onap dev-policy-pap-6cfdf64867-v94q8 0/2 Init:Error 2 1h 10.42.XXX onap-vm6 <none>
onap dev-sdc-sdc-dcae-be-tools-nchp4 0/1 Init:Error 0 5h 10.42.XXX onap-vm6 <none>
onap dev-sdc-sdc-wfd-be-workflow-init-xg2cf 0/1 Init:Error 0 5h 10.XXX onap-vm2 <none>
onap dev-vnfsdk-vnfsdk-init-postgres-2psxl 0/1 Init:Error 0 5h 10.XXXX onap-vm1 <none>
INIT PODS
onap-rancher@onap-rancher:~/oom/kubernetes$ kubectl get pods --all-namespaces -o=wide | grep Init
onap dev-aai-aai-champ-54c559d86c-gf8c2 0/2 Init:0/1 0 1h 10.XXXX onap-vm5 <none>
onap dev-aai-aai-spike-565c54f554-lvjrd 0/2 Init:0/1 5 1h 10.XXXX onap-vm1 <none>
onap dev-appc-appc-dgbuilder-768d9f4fb8-bxsgr 0/1 Init:0/1 0 1h 10.XXXXX onap-vm2 <none>
onap dev-clamp-clamp-dash-kibana-5d5d977fb4-dtj9s 0/1 Init:0/1 0 1h 10.XXXXX onap-vm5 <none>
onap dev-log-log-logstash-746f464bd7-pmgfs 0/1 Init:0/1 0 1h 10.XXXXX onap-vm5 <none>
onap dev-log-log-logstash-746f464bd7-sjpmh 0/1 Init:0/1 0 1h 10.42.123.231 onap-vm5 <none>
onap dev-oof-music-tomcat-64d4c64db7-8bsxq 0/1 Init:0/3 0 1h 10.XXXXX onap-vm5 <none>
onap dev-oof-oof-has-controller-9469b9ff8-jzfb9 0/1 Init:0/3 0 1h 10.XXXXX onap-vm1 <none>
onap dev-oof-oof-has-data-d559897dc-pbzdp 0/1 Init:2/4 17 3h 10.XXXXX onap-vm6 <none>
onap dev-oof-oof-has-reservation-868c7c88ff-tbnlr 0/1 Init:2/4 26 5h 10.XXXXXX onap-vm2 <none>
onap dev-oof-oof-has-solver-6f8bc6fdf4-sklc4 0/1 Init:2/4 31 5h 10.XXXX onap-vm1 <none>
onap dev-policy-brmsgw-5b8f5d6bd5-kt7rp 0/1 Init:0/1 0 1h 10.XXXXX onap-vm5 <none>
onap dev-policy-pap-6cfdf64867-v94q8 0/2 Init:Error 2 1h 10.XXXXX onap-vm6 <none>
onap dev-portal-portal-app-6f57b79969-d4n8x 0/2 Init:0/1 0 1h 10.XXXXX onap-vm5 <none>
onap dev-sdc-sdc-dcae-be-tools-nchp4 0/1 Init:Error 0 5h 10.XXXXX onap-vm6 <none>
onap dev-sdc-sdc-wfd-be-workflow-init-xg2cf 0/1 Init:Error 0 5h 10.XXXXX onap-vm2 <none>
onap dev-sdnc-sdnc-ansible-server-64bbf769d7-776f9 0/1 Init:0/1 1 1h 10.XXXXX onap-vm6 <none>
onap dev-sdnc-sdnc-dmaap-listener-84bffc54-627z2 0/1 Init:0/1 0 5h 10.XXXXX onap-vm2 <none>
onap dev-sdnc-sdnc-portal-5dcd4bfbf6-k48q5 0/1 Init:0/1 9 1h 10.XXXXX onap-vm1 <none>
onap dev-sdnc-sdnc-ueb-listener-6d74459c6-5rtcj 0/1 Init:0/1 0 1h 10.XXXX onap-vm5 <none>
onap dev-so-so-bpmn-infra-58bf4f656d-pt4l5 0/1 Init:0/1 0 1h 10.XXXXX onap-vm5 <none>
onap dev-vid-vid-7c5df6554b-2kkv9 0/2 Init:0/1 0 1h 10.XXXX onap-vm5 <none>
onap dev-vnfsdk-vnfsdk-init-postgres-2psxl 0/1 Init:Error 0 5h 10.XXXX onap-vm1 <none>
Regards,
Atif
Michael O'Brien
Hi, you are missing dmaap - therefore sdnc, sdc, aai will not start without it - also add consul and msb and multicloud
Log Streaming Compliance and API#DeploymentDependencyTree-Containerlevel
Also retries of config containers are normal - as long as at least one of the set is 0/1 Completed
Try the 3.0.0-ONAP tag from mid Dec - it has been vetted and is working fine except for AAI.
Note that casablanca's 3.0.1-ONAP has not be cut yet - hence possible instability
for deploy order consult
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n219
oof so sdc sdnc vid policy portal log vfc uui vnfsdk appc clamp cli pomba
Syed Atif Husain
Thanks Michael O'Brien. I will add dmaap, consul, msb and multicloud and reinstall.
I added less components as I have only 6 VMs (of 4 VCPUs and 16 GB RAM each, with 100 GB NFS for all 6)
I will still not add dcaegen2 as that requires more VMs, as per my understanding
Syed Atif Husain
Are 6 VMs (16 GB RAM each and 1000 GB NFS) enough for installing onap without dcae?
My install keeps failing with one or the other component not deployed, its a diff VM always.
I am wondering if it is due to less VMs
Jiadong Wu
Hi,
I plan to install ONAP C version in our lab, shoud OpenStack be ready before I installing ONAP? I mean can I install ONAP on physical server directly without OpenStack?
Ekko Chang
Yes
As I know, ONAP can be installed on any physical server without openstack.
Keong Lim
OpenStack is not necessary, but it is useful.
You could also refer to Installing A&AI on a (barebones) Ubuntu 16.04 machine using OOM
Or https://lists.onap.org/g/onap-discuss/message/16335
Both of those describe using a single VM.
Jiadong Wu
Thanks for your reply.
Is is possible to create a VNF(like vFW) in container via ONAP? As I dont have the OpenStack env fow now.
Kumar Lakshman Kumar
To Install ONAP, all you need to have is your K8S cluster up with needed resources on your physical machines.
you can setup Openstack later, but while installing ONAP you need to provide the details of VIM.
Jiadong Wu
Thanks for your reply.
Is is possible to create a VNF(like vFW) in container via ONAP? As I dont have the OpenStack env fow now.
Kumar Lakshman Kumar
AFAK its not possible at the moment, for booting up VNFs there should be underlying VIM( example: openstack)
as the artifacts we upload to SDC while service creation are of HEAT/TOSCA based. so having openstack UP would be nice for booting up VNFs.
Yogi Knss
Hi Michael O'Brien,
We are trying to install cassablanca on kubernetes and we are facing issues with AAI. AAi is not able to run because of issue with graphadmin, Can you let us know if we have missed some configuration ? We followed the same sequence that you have specified above. All other components "consul ,msb, dmaap, dcaegen2, aaf, robot" are up and running.
The following error is seen in "aai-aai-graphadmin-create-db-schema-mjc4z" JOB.
Project Build Version: 1.0.1
chown: changing ownership of '/opt/app/aai-graphadmin/resources/application.properties': Read-only file system
chown: changing ownership of '/opt/app/aai-graphadmin/resources/etc/appprops/aaiconfig.properties': Read-only file system
chown: changing ownership of '/opt/app/aai-graphadmin/resources/etc/appprops/janusgraph-cached.properties': Read-only file system
chown: changing ownership of '/opt/app/aai-graphadmin/resources/etc/appprops/janusgraph-realtime.properties': Read-only file system
chown: changing ownership of '/opt/app/aai-graphadmin/resources/etc/auth/aai_keystore': Read-only file system
chown: changing ownership of '/opt/app/aai-graphadmin/resources/localhost-access-logback.xml': Read-only file system
chown: changing ownership of '/opt/app/aai-graphadmin/resources/logback.xml': Read-only file system
Wed Apr 24 16:09:21 IST 2019 Starting /opt/app/aai-graphadmin/bin/createDBSchema.sh
---- NOTE --- about to open graph (takes a little while)--------;
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
at org.springframework.boot.loader.PropertiesLauncher.main(PropertiesLauncher.java:595)
Caused by: java.lang.ExceptionInInitializerError
at org.onap.aai.dbmap.AAIGraph.getInstance(AAIGraph.java:103)
at org.onap.aai.schema.GenTester.main(GenTester.java:126)
... 8 more
Caused by: java.lang.RuntimeException: Failed to instantiate graphs
at org.onap.aai.dbmap.AAIGraph.<init>(AAIGraph.java:85)
at org.onap.aai.dbmap.AAIGraph.<init>(AAIGraph.java:57)
at org.onap.aai.dbmap.AAIGraph$Helper.<clinit>(AAIGraph.java:90)
... 10 more
Caused by: org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:159)
at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration.get(KCVSConfiguration.java:100)
at org.janusgraph.diskstorage.configuration.BasicConfiguration.isFrozen(BasicConfiguration.java:106)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1394)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:164)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:133)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:113)
at org.onap.aai.dbmap.AAIGraph.loadGraph(AAIGraph.java:115)
at org.onap.aai.dbmap.AAIGraph.<init>(AAIGraph.java:82)
... 12 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT1M
at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 21 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:161)
at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:115)
at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getSlice(AstyanaxKeyColumnValueStore.java:104)
at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:103)
at org.janusgraph.diskstorage.configuration.backend.KCVSConfiguration$1.call(KCVSConfiguration.java:100)
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:148)
at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:162)
at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
... 22 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.233.66.28(10.233.66.28):9160, latency=1(1), attempts=1]UnavailableException()
at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:153)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352)
at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538)
at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:159)
... 29 more
Caused by: UnavailableException()
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14687)
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14633)
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:14559)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:741)
at org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:725)
at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:544)
at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:541)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
... 35 more
Failed to run the tool /opt/app/aai-graphadmin/bin/createDBSchema.sh successfully
Failed to run the createDBSchema.sh
Keong Lim
Hi Yogi Knss , this could be a problem with your Cassandra server, which has a recent change to Helm charts. Please send it to the onap-discuss mailing list where Mahendra Raghuwanshi could help or bring it to AAI Weekly Status Meeting (Cancelled) or AAI Developers Meeting where Venkata Harish Kajur could help.
Mahendra Raghuwanshi
Hey Yogi Knss, Is it Casablanca or Dublin?
Yogi Knss
Hi Mahendra Raghuwanshi , Its in Casablanca.
Mahendra Raghuwanshi
I would try to help here though the recent changes are in Dublin.
Can you please paste the pods status? "kubectl get pods -n onap"
Yogi Knss
Hi Mahendra Raghuwanshi
Please find the output for the command: "kubectl get pods -n onap"
[root@node201 kubernetes]# kubectl get pods -n onap
NAME READY STATUS RESTARTS AGE
tcs-aaf-aaf-cm-76696c8bcf-sqqpw 1/1 Running 0 7h
tcs-aaf-aaf-cs-6c69c87d44-vbtk7 1/1 Running 0 7h
tcs-aaf-aaf-fs-5fd8c8bd8d-fbbwb 1/1 Running 0 7h
tcs-aaf-aaf-gui-777fb85d96-vtwnf 1/1 Running 0 7h
tcs-aaf-aaf-hello-774f4b6f5-lb6t4 1/1 Running 0 7h
tcs-aaf-aaf-locate-fbd8f454b-lgkjd 1/1 Running 0 7h
tcs-aaf-aaf-oauth-77fcbf54cd-trdxx 1/1 Running 0 7h
tcs-aaf-aaf-service-9f76d4746-4knwv 1/1 Running 0 7h
tcs-aaf-aaf-sms-5f764cf7cf-vxl7g 1/1 Running 0 7h
tcs-aaf-aaf-sms-quorumclient-0 1/1 Running 0 7h
tcs-aaf-aaf-sms-quorumclient-1 1/1 Running 0 7h
tcs-aaf-aaf-sms-quorumclient-2 1/1 Running 0 6h
tcs-aaf-aaf-sms-vault-0 2/2 Running 3 7h
tcs-aaf-aaf-sshsm-distcenter-gmbwl 0/1 Completed 0 7h
tcs-aaf-aaf-sshsm-testca-69v6k 0/1 Init:Error 0 7h
tcs-aaf-aaf-sshsm-testca-6ghcc 0/1 Init:Error 0 7h
tcs-aaf-aaf-sshsm-testca-rz58d 0/1 Init:Error 0 7h
tcs-aaf-aaf-sshsm-testca-ttbq7 0/1 Init:Error 0 7h
tcs-aaf-aaf-sshsm-testca-xht79 0/1 Completed 0 6h
tcs-aaf-aaf-sshsm-testca-z6sfw 0/1 Init:Error 0 7h
tcs-aai-aai-5cbbdb4ff5-4hwpb 0/1 Init:0/1 45 7h
tcs-aai-aai-babel-68fc787d74-pvdtr 2/2 Running 0 1d
tcs-aai-aai-cassandra-0 1/1 Running 1 1d
tcs-aai-aai-champ-5b64bd67c7-qnx4j 1/2 Running 0 1d
tcs-aai-aai-data-router-6654597bdd-6nfmw 2/2 Running 6 1d
tcs-aai-aai-elasticsearch-68bcf6c8fd-9stb4 1/1 Running 0 1d
tcs-aai-aai-gizmo-d548b4d9-cvf6d 2/2 Running 0 1d
tcs-aai-aai-graphadmin-67f5965fb7-88dwz 0/2 Init:0/1 165 1d
tcs-aai-aai-graphadmin-create-db-schema-mjc4z 0/1 Error 0 1d
tcs-aai-aai-graphadmin-create-db-schema-p5vms 0/1 Error 0 1d
tcs-aai-aai-graphadmin-create-db-schema-sbhsk 0/1 Error 0 1d
tcs-aai-aai-graphadmin-create-db-schema-wqw29 0/1 Error 0 1d
tcs-aai-aai-graphadmin-create-db-schema-zrmpv 0/1 Error 0 1d
tcs-aai-aai-modelloader-6dfbcd7596-9dd9s 2/2 Running 0 1d
tcs-aai-aai-resources-588998b4ff-qnlvd 0/2 Init:0/1 165 1d
tcs-aai-aai-search-data-57666c9494-n24bc 2/2 Running 0 1d
tcs-aai-aai-sparky-be-7db4b8dcf-qr29r 0/2 Init:0/1 0 1d
tcs-aai-aai-spike-6f9f5f5c9d-fxcrs 2/2 Running 0 1d
tcs-aai-aai-traversal-7df69d5885-72kj5 0/2 Init:0/1 165 1d
tcs-aai-aai-traversal-update-query-data-gr2w8 0/1 Init:0/1 165 1d
tcs-consul-consul-69d7c64bdd-wms5l 1/1 Running 0 1d
tcs-consul-consul-server-0 1/1 Running 0 1d
tcs-consul-consul-server-1 1/1 Running 0 1d
tcs-consul-consul-server-2 1/1 Running 0 1d
tcs-dcaegen2-dcae-bootstrap-6b6bb89cd5-5vhmv 1/1 Running 0 1d
tcs-dcaegen2-dcae-cloudify-manager-6b6f59fc66-k79c9 1/1 Running 0 1d
tcs-dcaegen2-dcae-db-0 1/1 Running 0 1d
tcs-dcaegen2-dcae-db-1 1/1 Running 0 1d
tcs-dcaegen2-dcae-healthcheck-5fc6d94989-ch5kn 1/1 Running 0 1d
tcs-dcaegen2-dcae-pgpool-77b844664d-5hd4c 1/1 Running 0 1d
tcs-dcaegen2-dcae-pgpool-77b844664d-xxnh8 1/1 Running 0 1d
tcs-dcaegen2-dcae-redis-0 1/1 Running 0 1d
tcs-dcaegen2-dcae-redis-1 1/1 Running 0 1d
tcs-dcaegen2-dcae-redis-2 1/1 Running 0 1d
tcs-dcaegen2-dcae-redis-3 1/1 Running 0 1d
tcs-dcaegen2-dcae-redis-4 1/1 Running 0 1d
tcs-dcaegen2-dcae-redis-5 1/1 Running 0 1d
tcs-dmaap-dbc-pg-0 1/1 Running 0 1d
tcs-dmaap-dbc-pg-1 1/1 Running 0 1d
tcs-dmaap-dbc-pgpool-57d7b76446-qgrcs 1/1 Running 0 1d
tcs-dmaap-dbc-pgpool-57d7b76446-qphtf 1/1 Running 0 1d
tcs-dmaap-dmaap-bus-controller-7567b865b7-x7vtz 1/1 Running 0 1d
tcs-dmaap-dmaap-dr-db-655587488d-2j5b2 1/1 Running 1 1d
tcs-dmaap-dmaap-dr-node-649659c584-ctfjs 1/1 Running 0 1d
tcs-dmaap-dmaap-dr-prov-595cd8bc55-6kjtv 1/1 Running 6 1d
tcs-dmaap-message-router-5f7b985c88-dj8j2 1/1 Running 0 1d
tcs-dmaap-message-router-kafka-678fb8558b-hc768 1/1 Running 0 1d
tcs-dmaap-message-router-zookeeper-54bb8cd9cf-kl55w 1/1 Running 0 1d
tcs-esr-esr-gui-699d9f579b-hrhk9 1/1 Running 0 7h
tcs-esr-esr-server-85d8c5f57-vj7m4 2/2 Running 0 7h
tcs-msb-kube2msb-6db5fd8c85-xrbs7 1/1 Running 0 1d
tcs-msb-msb-consul-66445944b6-phjpk 1/1 Running 0 1d
tcs-msb-msb-discovery-6bd858b659-42xbp 2/2 Running 0 1d
tcs-msb-msb-eag-78fbb94cc9-7cn2p 2/2 Running 0 1d
tcs-msb-msb-iag-5d45c9999b-lwcb6 2/2 Running 0 1d
tcs-oof-cmso-db-0 0/1 CrashLoopBackOff 105 7h
tcs-oof-music-cassandra-0 1/1 Running 0 7h
tcs-oof-music-cassandra-1 1/1 Running 0 7h
tcs-oof-music-cassandra-2 1/1 Running 0 7h
tcs-oof-music-cassandra-job-config-z28dh 0/1 Completed 0 7h
tcs-oof-music-tomcat-8f64f65d8-4pbnp 1/1 Running 0 7h
tcs-oof-music-tomcat-8f64f65d8-dl58x 1/1 Running 0 7h
tcs-oof-music-tomcat-8f64f65d8-fmrjn 0/1 Init:0/3 0 7h
tcs-oof-oof-557c8c677d-rjb2b 0/1 Init:0/2 45 7h
tcs-oof-oof-cmso-service-5467475444-ppmd7 0/1 Init:0/2 45 7h
tcs-oof-oof-has-api-5d79d97fb7-nzfkx 0/1 Init:0/3 45 7h
tcs-oof-oof-has-controller-658bbb894-m9dqf 0/1 Init:0/3 45 7h
tcs-oof-oof-has-data-5575788564-zbv4x 0/1 Init:0/4 45 7h
tcs-oof-oof-has-healthcheck-k5swr 0/1 Init:0/1 45 7h
tcs-oof-oof-has-onboard-bvfsz 0/1 Init:0/2 45 7h
tcs-oof-oof-has-reservation-9fd696d8d-dbw2k 0/1 Init:0/4 45 7h
tcs-oof-oof-has-solver-7fd7878df9-dn4xw 0/1 Init:0/4 45 7h
tcs-oof-zookeeper-0 1/1 Running 0 7h
tcs-oof-zookeeper-1 1/1 Running 0 7h
tcs-oof-zookeeper-2 1/1 Running 0 7h
tcs-robot-robot-86d89ffdb9-xd65b 1/1 Running 0 1d
You have new mail in /var/spool/mail/root
[root@node201 kubernetes]#
Thanks
Syed Atif Husain
I am trying to revive my ONAP instance after disaster recovery of etcd cluster.
I have all 6 kub pods running and it shows one old tiller-deploy too in unknown state, but I am unable to deploy new pods. sudo helm deploy and helm list give same error
onap-m1@onap-m1:~$ helm list
Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist
onap-m1@onap-m1:~/oom/kubernetes/nbi$ sudo helm deploy nbi local/onap --namespace onap --verbose
fetching local/onap
Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist
Error: forwarding ports: error upgrading connection: unable to upgrade connection: pod does not exist
onap-m1@onap-m1:~/oom/kubernetes/nbi$ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
heapster-7b48b696fc-rzl8j 1/1 Running 0 49d
kube-dns-6655f78c68-hqpld 3/3 Running 0 49d
kubernetes-dashboard-6f54f7c4b-s94dh 1/1 Running 0 49d
monitoring-grafana-7877679464-jmkcc 1/1 Running 0 49d
monitoring-influxdb-64664c6cf5-8k5nm 1/1 Running 0 49d
tiller-deploy-6f4745cbcf-7ccql 1/1 Running 0 49d
tiller-deploy-b5f895978-p7vtc 0/1 Unknown 0 55d
onap-m1@onap-m1:~/oom/kubernetes/nbi$ helm init --upgrade
$HELM_HOME has been configured at /home/infyonap-m1/.helm.
Tiller (the Helm server-side component) has been upgraded to the current version.
Happy Helming!
Saurabh Arora
Hi All,
I was trying to install ONAP casablanca release using below Link:
https://gerrit.onap.org/r/gitweb?p=oom.git;a=blob;f=kubernetes/README.md
https://docs.onap.org/en/casablanca/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html#onap-on-kubernetes-with-rancher
But I am not able to install ONAP using nexus repo. Trying to use the minimal-deployment.yaml file for VNF spawning. Below are the errors:
root@kubernetes:/home/ubuntu# kubectl describe po dev-aai-aai-schema-service-7d545fd565-tzq4t -n onap
Name: dev-aai-aai-schema-service-7d545fd565-tzq4t
Namespace: onap
Node: localhost/172.19.51.202
Start Time: Wed, 01 May 2019 15:43:58 +0000
Labels: app=aai-schema-service
pod-template-hash=3810198121
release=dev-aai
Annotations: checksum/config=de556424c5e051ed5b7ffb86e02d72473279ccecc54fab78c5954f67f83f8bcf
kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"onap","name":"dev-aai-aai-schema-service-7d545fd565","uid":"eecdfd1c-6c27-11e9-ba...
Status: Pending
IP: 10.42.119.74
Created By: ReplicaSet/dev-aai-aai-schema-service-7d545fd565
Controlled By: ReplicaSet/dev-aai-aai-schema-service-7d545fd565
Containers:
aai-schema-service:
Container ID:
Image: nexus3.onap.org:10001/onap/aai-schema-service:1.0-STAGING-latest
Image ID:
Ports: 8452/TCP, 5005/TCP
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Readiness: tcp-socket :8452 delay=60s timeout=1s period=10s #success=1 #failure=3
Environment:
LOCAL_USER_ID: 1000
LOCAL_GROUP_ID: 1000
Mounts:
/etc/localtime from localtime (ro)
/opt/aai/logroot/AAI-SS from dev-aai-aai-schema-service-logs (rw)
/opt/app/aai-schema-service/resources/application.properties from springapp-conf (rw)
/opt/app/aai-schema-service/resources/etc/appprops/aaiconfig.properties from aaiconfig-conf (rw)
/opt/app/aai-schema-service/resources/etc/auth/aai_keystore from auth-truststore-sec (rw)
/opt/app/aai-schema-service/resources/etc/auth/realm.properties from realm-conf (rw)
/opt/app/aai-schema-service/resources/localhost-access-logback.xml from localhost-access-log-conf (rw)
/opt/app/aai-schema-service/resources/logback.xml from dev-aai-aai-schema-service-log-conf (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tcdfh (ro)
filebeat-onap:
Container ID: docker://f017f17a263f55165fdc56622488824929e4dd5ba8c11f34fd30c57c24188721
Image: docker.elastic.co/beats/filebeat:5.5.0
Image ID: docker-pullable://docker.elastic.co/beats/filebeat@sha256:fe7602b641ed8ee288f067f7b31ebde14644c4722d9f7960f176d621097a5942
Port: <none>
State: Running
Started: Wed, 01 May 2019 15:50:40 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/usr/share/filebeat/data from dev-aai-aai-schema-service-filebeat (rw)
/usr/share/filebeat/filebeat.yml from filebeat-conf (rw)
/var/log/onap from dev-aai-aai-schema-service-logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tcdfh (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
aai-common-aai-auth-mount:
Type: Secret (a volume populated by a Secret)
SecretName: aai-common-aai-auth
Optional: false
localtime:
Type: HostPath (bare host directory volume)
Path: /etc/localtime
filebeat-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: aai-filebeat
Optional: false
dev-aai-aai-schema-service-logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
dev-aai-aai-schema-service-filebeat:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
dev-aai-aai-schema-service-log-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dev-aai-aai-schema-service-log
Optional: false
localhost-access-log-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dev-aai-aai-schema-service-localhost-access-log-configmap
Optional: false
springapp-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dev-aai-aai-schema-service-springapp-configmap
Optional: false
aaiconfig-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dev-aai-aai-schema-service-aaiconfig-configmap
Optional: false
realm-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dev-aai-aai-schema-service-realm-configmap
Optional: false
auth-truststore-sec:
Type: Secret (a volume populated by a Secret)
SecretName: aai-common-truststore
Optional: false
default-token-tcdfh:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tcdfh
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m default-scheduler Successfully assigned dev-aai-aai-schema-service-7d545fd565-tzq4t to localhost
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "dev-aai-aai-schema-service-logs"
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "dev-aai-aai-schema-service-filebeat"
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "localtime"
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "filebeat-conf"
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "realm-conf"
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "dev-aai-aai-schema-service-log-conf"
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "localhost-access-log-conf"
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "aaiconfig-conf"
Normal SuccessfulMountVolume 9m kubelet, localhost MountVolume.SetUp succeeded for volume "springapp-conf"
Normal SuccessfulMountVolume 8m (x3 over 8m) kubelet, localhost (combined from similar events): MountVolume.SetUp succeeded for volume "default-token-tcdfh"
Warning Failed 6m kubelet, localhost Failed to pull image "nexus3.onap.org:10001/onap/aai-schema-service:1.0-STAGING-latest": rpc error: code = Unknown desc = Error while pulling image: Get http://nexus3.onap.org:10001/v1/repositories/onap/aai-schema-service/images: dial tcp 199.204.45.137:10001: getsockopt: no route to host
Normal Pulling 6m kubelet, localhost pulling image "docker.elastic.co/beats/filebeat:5.5.0"
Normal Pulled 2m kubelet, localhost Successfully pulled image "docker.elastic.co/beats/filebeat:5.5.0"
Warning FailedSync 2m kubelet, localhost Error syncing pod
Normal Created 2m kubelet, localhost Created container
Normal Started 2m kubelet, localhost Started container
Normal Pulling 2m (x2 over 8m) kubelet, localhost pulling image "nexus3.onap.org:10001/onap/aai-schema-service:1.0-STAGING-latest"
root@kubernetes:/home/ubuntu# docker pull nexus3.onap.org/onap/aai-schema-service:1.0-STAGING-latest
Pulling repository nexus3.onap.org/onap/aai-schema-service
Error: image onap/aai-schema-service:1.0-STAGING-latest not found
Does we have the required images in nexus repo, if not how are we deploying ONAP components. All the pods are throwing ErrImagePull issue
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system heapster-76b8cd7b5-7zbzh 1/1 Running 11 5d 10.42.225.253 localhost
kube-system kube-dns-5d7b4487c9-r8v5k 3/3 Running 30 5d 10.42.94.154 localhost
kube-system kubernetes-dashboard-f9577fffd-vlmnw 1/1 Running 10 5d 10.42.26.12 localhost
kube-system monitoring-grafana-997796fcf-x8ldh 1/1 Running 10 5d 10.42.169.42 localhost
kube-system monitoring-influxdb-56fdcd96b-9675t 1/1 Running 11 5d 10.42.99.139 localhost
kube-system tiller-deploy-dccdb6fd9-8zwz6 1/1 Running 4 1d 10.42.202.241 localhost
onap dev-aai-aai-67dc87974b-t4s8j 0/1 Init:ErrImagePull 0 3h 10.42.94.161 localhost
onap dev-aai-aai-babel-7bf9b8cd55-z7ss7 0/2 ErrImagePull 0 3h 10.42.11.69 localhost
onap dev-aai-aai-champ-7df94789d5-f4xm9 0/2 Init:ErrImagePull 0 3h 10.42.127.130 localhost
onap dev-aai-aai-data-router-79dc5c98ff-b44ll 0/2 Init:ErrImagePull 0 3h 10.42.58.117 localhost
onap dev-aai-aai-elasticsearch-55bb9dbb6c-bv5hl 0/1 Init:ErrImagePull 0 3h 10.42.211.63 localhost
onap dev-aai-aai-gizmo-5bb859b7db-65v2v 0/2 ErrImagePull 0 3h 10.42.133.140 localhost
onap dev-aai-aai-graphadmin-76dc9c4574-826t4 0/2 Init:ErrImagePull 0 3h 10.42.58.205 localhost
onap dev-aai-aai-graphadmin-create-db-schema-mm2xh 0/1 Init:ErrImagePull 0 3h 10.42.65.67 localhost
onap dev-aai-aai-modelloader-6f79dbd958-jkjfv 0/2 ErrImagePull 0 3h 10.42.62.17 localhost
onap dev-aai-aai-resources-5cfc9c854b-g8hlz 0/2 Init:ErrImagePull 0 3h 10.42.47.31 localhost
onap dev-aai-aai-schema-service-7d545fd565-qknrc 0/2 ErrImagePull 0 3h 10.42.32.28 localhost
onap dev-aai-aai-search-data-6fb67f5b8c-kqpb6 0/2 ErrImagePull 0 3h 10.42.238.99 localhost
onap dev-aai-aai-sparky-be-6b6c764dbf-qrvlt 0/2 Init:ErrImagePull 0 3h 10.42.11.139 localhost
onap dev-aai-aai-spike-7fbd58df9c-d4cnj 0/2 Init:ErrImagePull 0 3h 10.42.82.137 localhost
onap dev-aai-aai-traversal-574d9495c4-9fm74 0/2 Init:ErrImagePull 0 3h 10.42.44.10 localhost
onap dev-aai-aai-traversal-update-query-data-lgpg8 0/1 Init:ErrImagePull 0 3h 10.42.42.141 localhost
onap dev-dmaap-dbc-pg-0 0/1 Init:ErrImagePull 0 3h 10.42.60.58 localhost
onap dev-dmaap-dbc-pgpool-64ffd87f58-8rbsx 0/1 ErrImagePull 0 3h 10.42.225.133 localhost
onap dev-dmaap-dbc-pgpool-64ffd87f58-hmg9q 0/1 ErrImagePull 0 3h 10.42.132.89 localhost
onap dev-dmaap-dmaap-bc-86f795c7d7-rzd66 0/1 Init:ErrImagePull 0 3h 10.42.6.134 localhost
onap dev-dmaap-dmaap-bc-post-install-cklt6 0/1 ErrImagePull 0 3h 10.42.6.109 localhost
onap dev-dmaap-dmaap-dr-db-0 0/1 Init:ErrImagePull 0 3h 10.42.148.194 localhost
onap dev-dmaap-dmaap-dr-node-0 0/2 Init:ErrImagePull 0 3h 10.42.215.234 localhost
onap dev-dmaap-dmaap-dr-prov-6cb4fdf5f5-247lk 0/2 Init:ErrImagePull 0 3h 10.42.33.228 localhost
onap dev-dmaap-message-router-0 0/1 Init:ErrImagePull 0 3h 10.42.189.4 localhost
onap dev-dmaap-message-router-kafka-0 0/1 Init:ErrImagePull 0 3h 10.42.197.123 localhost
onap dev-dmaap-message-router-kafka-1 0/1 Init:ErrImagePull 0 3h 10.42.160.21 localhost
onap dev-dmaap-message-router-kafka-2 0/1 Init:ErrImagePull 0 3h 10.42.65.237 localhost
onap dev-dmaap-message-router-mirrormaker-5879bcc59c-kk87r 0/1 Init:ErrImagePull 0 3h 10.42.88.225 localhost
onap dev-dmaap-message-router-zookeeper-0 0/1 Init:ErrImagePull 0 3h 10.42.99.148 localhost
onap dev-dmaap-message-router-zookeeper-1 0/1 Init:ErrImagePull 0 3h 10.42.20.211 localhost
onap dev-dmaap-message-router-zookeeper-2 0/1 Init:ErrImagePull 0 3h 10.42.196.212 localhost
onap dev-nfs-provisioner-nfs-provisioner-c55796c8-8rz78 0/1 ErrImagePull 0 3h 10.42.97.104 localhost
onap dev-portal-portal-app-6bb6f9fc84-7ts28 0/2 Init:ErrImagePull 0 3h 10.42.242.31 localhost
onap dev-portal-portal-cassandra-56b589b85d-bkzn7 0/1 ErrImagePull 0 3h 10.42.197.68 localhost
onap dev-portal-portal-db-64d77c6965-cdt8z 0/1 ErrImagePull 0 3h 10.42.119.237 localhost
onap dev-portal-portal-db-config-xdn45 0/2 Init:ErrImagePull 0 3h 10.42.23.217 localhost
onap dev-portal-portal-sdk-74ddb7b88b-hzw47 0/2 Init:ErrImagePull 0 3h 10.42.186.32 localhost
onap dev-portal-portal-widget-7c88cc5644-5jt7p 0/1 Init:ErrImagePull 0 3h 10.42.40.35 localhost
onap dev-portal-portal-zookeeper-dc97c5cb8-rrcmp 0/1 ErrImagePull 0 3h 10.42.179.159 localhost
onap dev-robot-robot-5f6f964796-7lfjs 0/1 ErrImagePull 0 3h 10.42.101.201 localhost
onap dev-sdc-sdc-be-7bd86986c-92ns4 0/2 Init:ErrImagePull 0 3h 10.42.217.170 localhost
onap dev-sdc-sdc-be-config-backend-2wbr7 0/1 Init:ErrImagePull 0 3h 10.42.95.216 localhost
onap dev-sdc-sdc-cs-6b6df5b7bf-gqd6v 0/1 ErrImagePull 0 3h 10.42.216.158 localhost
onap dev-sdc-sdc-cs-config-cassandra-n95rr 0/1 Init:ErrImagePull 0 3h 10.42.101.76 localhost
onap dev-sdc-sdc-dcae-be-5dbf4958d5-mngdg 0/2 Init:ErrImagePull 0 3h 10.42.45.18 localhost
onap dev-sdc-sdc-dcae-be-tools-wcx5p 0/1 Init:ErrImagePull 0 3h 10.42.170.29 localhost
onap dev-sdc-sdc-dcae-dt-5486658bd4-q4jgh 0/2 Init:ErrImagePull 0 3h 10.42.236.131 localhost
onap dev-sdc-sdc-dcae-fe-7bf7bb6868-w942h 0/2 Init:ErrImagePull 0 3h 10.42.106.10 localhost
onap dev-sdc-sdc-dcae-tosca-lab-7f464d664-59s5r 0/2 Init:ErrImagePull 0 3h 10.42.211.93 localhost
onap dev-sdc-sdc-es-794fbfdc-cfrk2 0/1 ErrImagePull 0 3h 10.42.171.32 localhost
onap dev-sdc-sdc-es-config-elasticsearch-jfzdk 0/1 Init:ErrImagePull 0 3h 10.42.133.249 localhost
onap dev-sdc-sdc-fe-6dbf9b9499-vddzz 0/2 Init:ErrImagePull 0 3h 10.42.56.11 localhost
onap dev-sdc-sdc-kb-857697d4b9-j4xrq 0/1 Init:ErrImagePull 0 3h 10.42.176.202 localhost
onap dev-sdc-sdc-onboarding-be-78b9b774d7-hntlc 0/2 Init:ErrImagePull 0 3h 10.42.59.151 localhost
onap dev-sdc-sdc-onboarding-be-cassandra-init-4zg7x 0/1 Init:ErrImagePull 0 3h 10.42.146.197 localhost
onap dev-sdc-sdc-wfd-be-84875d7cbc-hp8lq 0/1 Init:ErrImagePull 0 3h 10.42.74.131 localhost
onap dev-sdc-sdc-wfd-be-workflow-init-b8hb4 0/1 Init:ErrImagePull 0 3h 10.42.201.111 localhost
onap dev-sdc-sdc-wfd-fe-75b667c9d4-b5mvp 0/2 Init:ErrImagePull 0 3h 10.42.60.77 localhost
onap dev-sdnc-cds-blueprints-processor-5d8b7df7c9-zxcvd 0/1 Init:ErrImagePull 0 3h 10.42.224.132 localhost
onap dev-sdnc-cds-command-executor-64b8df54b6-gg6hq 0/1 Init:ErrImagePull 0 3h 10.42.31.166 localhost
onap dev-sdnc-cds-controller-blueprints-599bf864f8-ss6z6 0/1 Init:ErrImagePull 0 3h 10.42.181.199 localhost
onap dev-sdnc-cds-db-0 0/1 Init:ErrImagePull 0 3h 10.42.208.121 localhost
onap dev-sdnc-cds-ui-69b899bc56-9r69r 0/1 ErrImagePull 0 3h 10.42.215.100 localhost
onap dev-sdnc-nengdb-0 0/1 Init:ErrImagePull 0 3h 10.42.24.96 localhost
onap dev-sdnc-network-name-gen-5b54568465-lfbkh 0/1 Init:ErrImagePull 0 3h 10.42.177.232 localhost
onap dev-sdnc-sdnc-0 0/2 Init:ErrImagePull 0 3h 10.42.39.93 localhost
onap dev-sdnc-sdnc-ansible-server-56bc6fcd6-h6m4z 0/1 Init:ErrImagePull 0 3h 10.42.233.133 localhost
onap dev-sdnc-sdnc-dgbuilder-55cf7b4d7d-v82l6 0/1 Init:ErrImagePull 0 3h 10.42.8.213 localhost
onap dev-sdnc-sdnc-dmaap-listener-66585656b5-v7z4w 0/1 Init:ErrImagePull 0 3h 10.42.24.3 localhost
onap dev-sdnc-sdnc-ueb-listener-789664f965-jz66b 0/1 Init:ErrImagePull 0 3h 10.42.26.58 localhost
onap dev-so-so-56bdcf95fc-w596f 0/1 Init:ErrImagePull 0 3h 10.42.27.80 localhost
onap dev-so-so-bpmn-infra-cff9cb58f-mb7qb 0/1 Init:ErrImagePull 0 3h 10.42.250.85 localhost
onap dev-so-so-catalog-db-adapter-6c99c756fd-4tz8z 0/1 Init:ErrImagePull 0 3h 10.42.199.93 localhost
onap dev-so-so-mariadb-config-job-6j9cl 0/1 Init:ErrImagePull 0 3h 10.42.140.183 localhost
onap dev-so-so-monitoring-5f779f77d9-rlml2 0/1 Init:ErrImagePull 0 3h 10.42.140.167 localhost
onap dev-so-so-openstack-adapter-7c4b76f694-jk7hf 0/1 Init:ErrImagePull 0 3h 10.42.229.180 localhost
onap dev-so-so-request-db-adapter-74d8bbd8bd-mwktt 0/1 Init:ErrImagePull 0 3h 10.42.169.84 localhost
onap dev-so-so-sdc-controller-bcccff948-rbbrw 0/1 Init:ErrImagePull 0 3h 10.42.61.156 localhost
onap dev-so-so-sdnc-adapter-86c457f8b9-hhqst 0/1 ErrImagePull 0 3h 10.42.0.192 localhost
onap dev-so-so-vfc-adapter-78d6cbff45-kv9b2 0/1 Init:ErrImagePull 0 3h 10.42.218.251 localhost
onap dev-so-so-vnfm-adapter-79467b8754-cvcnl 0/1 ErrImagePull 0 3h 10.42.155.68 localhost
onap dev-vid-vid-844f85cdf5-4c6jb 0/2 Init:ErrImagePull 0 3h 10.42.71.181 localhost
onap dev-vid-vid-galera-config-n46pk 0/1 Init:ErrImagePull 0 3h 10.42.68.120 localhost
onap dev-vid-vid-mariadb-galera-0 0/1 ErrImagePull 0 3h 10.42.170.206 localhost
Kumar Lakshman Kumar
Hello,
I am trying to install ONAP dublin and getting this error while make all
Error: Can't get a valid version for repositories aai. Try changing the version constraint in requirements.yaml
make[1]: *** [dep-onap] Error 1
make[1]: Leaving directory `/home/centos/dublin/oom/kubernetes'
make: *** [onap] Error 2
the oom/kubernetes/aai folder is empty, anyone have solved this issue?
Brian Freeman
did you add the --recurse-submodules flag ?
Gaurav Mittal
Hi,
I am installing the ONAP dublin release using OOM with RKE and Openstack. Followed the documentation as provided in the link. Below pods are getting CrashLoopBackOff because of "No private IPv4 address found". Any help will be highly appreciable.
dev-consul-consul-9dfb9876c-vnh6r
dev-consul-consul-server-0
dev-aaf-aaf-sms-vault-0
Regards,
Gaurav
Kumar Lakshman Kumar
Hi Gaurav,
Looks like networking issues at kubernetes cluster, if you can provide more details will be easy to answer. ex: some logs.
Gaurav Mittal
Hi Lakshman,
Really appreciate your response on this. Below is the setup details.
Openstack details -
We have 7 bare metals R630 Dell Servers (16cpu and 64GB Ram each). All are on Ubuntu OS and Firewall and iptables are disabled.
Server 1 - all in one packstack install of Queens flavro
Server 2 to Server 7 - Compute nodes.
Followed the ONAP dublin release installation from below link. (ONAP on HA Kubernetes Cluster)
https://docs.onap.org/en/dublin/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html#onap-on-kubernetes-with-rancher
Created 2 network interfaces -
Public - 10.211.5.X/24 (Gateway - 10.211.5.254)
Private - 10.0.0.0/16 (Gateway - 10.0.0.1)
Created a Router which connects above two interfaces. Also the security group is default and add all the rules to allow all the ingress and egress traffic for tcp, udp, icmp, DNS, etc
Kubernetes cluster is created using the RKE and cluster.yml file. BElow is the kubernmetes cluster details.
Master - 1 node
Worker - 6 nodes
also created a Instance where the RKE is installed and also serves as a NFS server.
Kindly let me know which logs can help to debug the issue.
Regards,
Gaurav
Kumar Lakshman Kumar
Hi Gaurav,
if all pods fail with same issue for one of the pods provide these logs
kubectl describe pod <pod-name>
kubectl logs <pod_name>
make sure kubectl get nodes, shows all your nodes Ready.
Gaurav Mittal
Applications that I am installing are: aaf, aai, cassandra, consul, dcaegen2, dmaap, mariadb-galera, msb, portal.
Only below pods have problem.
dev-aaf-aaf-sms-5974b5cc94-ckrgl
dev-aaf-aaf-sms-preload-rg92x
dev-aaf-aaf-sms-vault-0
dev-aaf-aaf-sshsm-testca-fb8tk
dev-consul-consul-9dfb9876c-jtq2m
dev-consul-consul-server-0
dev-dcaegen2-dcae-bootstrap-7998db797b-xkhfh
dev-dcaegen2-dcae-config-binding-service-5c7564f46c-6284g
dev-dcaegen2-dcae-deployment-handler-757f587777-c68rm
dev-dcaegen2-dcae-policy-handler-7b4cb7ddc-j5wt5
dev-dcaegen2-dcae-servicechange-handler-cfb88cd89-wmvgq
ubuntu@rke:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready controlplane,etcd 18h v1.13.5
k8s-slave-1 Ready worker 18h v1.13.5
k8s-slave-2 Ready worker 18h v1.13.5
k8s-slave-3 Ready worker 18h v1.13.5
k8s-slave-4 Ready worker 18h v1.13.5
k8s-slave-5 Ready worker 18h v1.13.5
k8s-slave-6 Ready worker 18h v1.13.5
ubuntu@rke:~$
Attached are the output as requested for consul.
han cock
Hi, everyone.
I'm new here so sorry if my worry seems trivial.
I would like to join the ONAP community.
First of all, I have been trying to get some modules start on my PC using minikube but to no avail.
I followed the documentation on this link https://github.com/onap/oom/blob/master/kubernetes/README.md and most things seem to be depreciated like helm tiller.
nonetheless, i patched things up and managed to deploy. But at the end my installation had no pods to show despite successfully installing the helm charts.
so i came here to follow the installation links only to realise on the links highlighted above are broken.
please can someone povide viable links for the installation of oom with kubernetes.
Eagerly waiting.
Thanks
Sergey Kolpakov
Hi, I've got the same issue, did you have answers on you questions?
bao air
Hi, here:https://docs.onap.org/projects/onap-oom/en/guilin/oom_quickstart_guide_helm3.html