2 answers
- 10-1
Marcus, I have experienced the same problem with jobs. Seems that there is limited time during job execution are retrying waiting for their dependencies and later can be that installation get stuck without manual intervention. Some pods wait for e.g. database schema to be applied which is done only by one of these jobs
To solve it I exported every failed job definition to yaml file. Removed from that file two lines with "controller-uid" , then removed the job with unsuccessful status from kubernetes (kubectl delete job failedjob) and created them back again (kubectl create -f jobdefinition.yaml ) .
When that jobs had been finished successfuly next part of installation went further.
Add your comment... - 10-1
Marcus, this is very good triage work - I think it deserves it's own JIRA EPIC or wiki page and onap-discuss article - anyway I think each app specific workaround/issue would need a subtask - I'll link to this work where applicable for some shared issues bringing the system up. On the 11th of June I was able to get everything except 3 pods up on a clean system (AWS this time) but used an overkill cluster with 4 vms at 16 cores and 122g ram each with 20Gbps network links - a portion of these are random timing related.
/michael
- Marcus D
Hi Michael,
Is there documentation regarding what configurations I should use for the OpenStack options in the values.yaml file? I would like to confirm that sharing the admin user among all of these roles is acceptable.
Thanks,
Marcus
- Gülsüm Atıcı
Hello,
In /oom/kubernetes/onap/values.yaml file; 3 modules(appc, nbi, so ) includes the openstack related parameters. Which parameters are required to be customized and how can we do it ?
I couln't find clear explanation regarding with this file. It would be very useful if there is any example configuration.
Thanks a lot.
appc:
enabled: true
config:
openStackType: OpenStackProvider
openStackName: OpenStack
openStackKeyStoneUrl: http://localhost:8181/apidoc/explorer/index.html
openStackServiceTenantName: default
openStackDomain: default
openStackUserName: admin
openStackEncryptedPassword: adminnbi:
enabled: true
config:
# openstack configuration
openStackRegion: "Yolo"
openStackVNFTenantId: "1234"so:
enabled: truereplicaCount: 1
liveness:
# necessary to disable liveness probe when setting breakpoints
# in debugger so K8s doesn't restart unresponsive container
enabled: true# so server configuration
config:
# message router configuration
dmaapTopic: "AUTO"
# openstack configuration
openStackUserName: "vnf_user"
openStackRegion: "RegionOne"
openStackKeyStoneUrl: "http://1.2.3.4:5000"
openStackServiceTenantName: "service"
openStackEncryptedPasswordHere: "b51fd164d68bdf2ef9fr94gvtrlk4jkgbr"# configure embedded mariadb
mariadb:
config:
mariadbRootPassword: password
Add your comment...
Hello,
I am trying to install the Beijing release on OpenStack with Kubernetes. I have 3 VM's; one for Rancher, and two for Kubernetes. They have the following specs:
Instance Name VCPUs Disk RAM Time since created
onap_rancher_vm 2 40GB 4GB 6 days, 20 hours
onap-k8s-vm-2 16 100GB 64GB 6 days, 19 hours
onap-k8s-vm-1 16 100GB 64GB 6 days, 19 hours
I am following the quick start guide to install ONAP: https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_quickstart_guide.html. I am fairly new with OpenStack, Kubernetes, and ONAP.
After letting the install run for more than an hour, the following pods fail to become ready:
sdc-be
sdc-be-config
sdc-es
sdc-es-config-elasticsearch
sdc-fe
sdc-kb
sdnc-db
sdnc-ueb-listener
The following is what I performed to bring the pods to a ready state:
Warning Unhealthy 2m (x52 over 11m) kubelet, onap-k8s-vm-1 Readiness probe failed: APPC is not healthy.
++ ps -e
++ grep startODL
++ wc -l
+ startODL_status=1
++ /opt/opendaylight/current/bin/client bundle:list
++ grep Waiting
++ wc -l
Failed to get the session.
+ waiting_bundles=0
++ /opt/opendaylight/current/bin/client system:start-level
Failed to get the session.
+ run_level=
+ '[' '' == 'Level 100' ']'
+ echo APPC is not healthy.
+ exit 1
ARNING: jmx_fqdn= missing name or value
WARNING: policy_msOnapName= missing name or value
WARNING: policy_msPolicyName= missing name or value
Starting configure of brmsgw under policy:policy ownership with umask 0022.
WARNING: UEB_API_KEY= missing name or value
WARNING: UEB_API_SECRET= missing name or value
WARNING: BRMS_UEB_DELAY= missing name or value
WARNING: BRMS_UEB_API_KEY= missing name or value
WARNING: BRMS_UEB_API_SECRET= missing name or value
WARNING: UEB_API_KEY= missing name or value
WARNING: UEB_API_SECRET= missing name or value
WARNING: BRMS_UEB_DELAY= missing name or value
WARNING: BRMS_UEB_API_KEY= missing name or value
WARNING: BRMS_UEB_API_SECRET= missing name or value
clamp: removed health checks
2018-07-06 17:15:04 140190038890368 [Note] mysqld (mysqld 10.2.15-MariaDB-10.2.15+maria~jessie) starting as process 1 ...
2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Uses event mutexes
2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Compressed tables use zlib 1.2.8
2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Using Linux native AIO
2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Number of pools: 1
2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Using SSE2 crc32 instructions
2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Completed initialization of buffer pool
2018-07-06 17:15:04 140189302863616 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-07-06 17:15:06 140190038890368 [Note] InnoDB: Highest supported file format is Barracuda.
2018-07-06 17:15:18 140190038890368 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-07-06 17:15:18 140190038890368 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-07-06 17:15:18 140190038890368 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-07-06 17:15:18 140190038890368 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-07-06 17:15:18 140190038890368 [Note] InnoDB: 5.7.22 started; log sequence number 3730924
2018-07-06 17:15:18 140188923713280 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-07-06 17:15:18 140190038890368 [Note] Plugin 'FEEDBACK' is disabled.
2018-07-06 17:15:19 140190038890368 [Note] Server socket created on IP: '::'.
2018-07-06 17:15:20 140190038890368 [Warning] 'proxies_priv' entry '@% root@dev-portal-db-84f48ddccb-cslbv' ignored in --skip-name-resolve mode.
2018-07-06 17:15:21 140190038890368 [Note] Reading of all Master_info entries succeded
2018-07-06 17:15:21 140190038890368 [Note] Added new Master_info '' to hash table
2018-07-06 17:15:21 140190038890368 [Note] mysqld: ready for connections.
Version: '10.2.15-MariaDB-10.2.15+maria~jessie' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
2018-07-06 17:15:24 140188923713280 [Note] InnoDB: Buffer pool(s) load completed at 180706 17:15:24
2018-07-06 17:35:13 140189880547072 [Warning] Aborted connection 51 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 17:35:13 140189883221760 [Warning] Aborted connection 50 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 17:35:13 140189283657472 [Warning] Aborted connection 54 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 17:35:13 140189283960576 [Warning] Aborted connection 53 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 17:35:18 140189880850176 [Warning] Aborted connection 49 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 18:35:13 140189283657472 [Warning] Aborted connection 788 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 18:35:13 140189880547072 [Warning] Aborted connection 786 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 18:35:13 140189283354368 [Warning] Aborted connection 784 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 18:35:13 140189880850176 [Warning] Aborted connection 785 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
2018-07-06 18:35:13 140189883221760 [Warning] Aborted connection 787 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
sdc-be: cannot resolve; depends on sdc-es-config-elasticsearch
sdc-be-config: cannot resolve; depends on sdc-be
sdc-es: removed health check
sdc-es-config-elasticsearch: crashes with error in logs:
[2018-07-06T18:32:55+00:00] INFO: Retrying execution of ruby_block[check_ElasticSearch_Cluster_Health], 0 attempt(s) left
================================================================================
Error executing action `run` on resource 'ruby_block[check_ElasticSearch_Cluster_Health]'
================================================================================
Errno::ECONNREFUSED
-------------------
Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)
Cookbook Trace:
---------------
/root/chef-solo/cache/cookbooks/init-sdc-elasticsearch/recipes/ES_1_create_audit_template.rb:9:in `block (2 levels) in from_file'
...
System Info:
------------
chef_version=13.8.5
platform=alpine
platform_version=3.7.0
ruby=ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-linux-musl]
program_name=chef-solo worker: ppid=7;start=18:31:15;
executable=/usr/bin/chef-solo
[2018-07-06T18:32:57+00:00] INFO: Running queued delayed notifications before re-raising exception
[2018-07-06T18:32:57+00:00] INFO: Running queued delayed notifications before re-raising exception
Running handlers:
[2018-07-06T18:32:57+00:00] ERROR: Running exception handlers
[2018-07-06T18:32:57+00:00] ERROR: Running exception handlers
Running handlers complete
[2018-07-06T18:32:57+00:00] ERROR: Exception handlers complete
[2018-07-06T18:32:57+00:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 01 minutes 41 seconds
[2018-07-06T18:32:57+00:00] FATAL: Stacktrace dumped to /root/chef-solo/cache/chef-stacktrace.out
[2018-07-06T18:32:57+00:00] FATAL: Stacktrace dumped to /root/chef-solo/cache/chef-stacktrace.out
[2018-07-06T18:32:57+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2018-07-06T18:32:57+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2018-07-06T18:32:57+00:00] ERROR: ruby_block[check_ElasticSearch_Cluster_Health] (init-sdc-elasticsearch::ES_1_create_audit_template line 4) had an error: Errno::ECONNREFUSED: Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)
[2018-07-06T18:32:57+00:00] ERROR: ruby_block[check_ElasticSearch_Cluster_Health] (init-sdc-elasticsearch::ES_1_create_audit_template line 4) had an error: Errno::ECONNREFUSED: Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)
[2018-07-06T18:32:57+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
[2018-07-06T18:32:57+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
sdc-fe: relies on sdc-kb and sdc-be-config-backend job
sdc-kb: relies on sdc-es-config-elasticsearch
sdnc-db: became ready after removing health check for sdc-es
sdnc-ueb-listener: relies on sdnc-be
The following jobs fail to succeed, and fail to try again after 7 retries:
The pods which I removed the health check reported that the connection to the port had been refused and could not sync. The other pods which I could not resolve to a ready state or removed the init containers rely on other pods, which I have provided the logs for. The system does not pass the Consul health checks nor the robot check.
I should note that I have tried increasing the time for the health check probes, but it has not resolved my issues.
I also have a few questions regarding OpenStack configuration. I have a feeling that I am not configuring the ./onap/values.yaml file correctly, specifically for appc, nbi, and so.
For appc:
For nbi:
For so:
Sorry for the long post and many questions. Thank you so much!