Beijing install pods fails to start and OpenStack configuration

Hello,

I am trying to install the Beijing release on OpenStack with Kubernetes. I have 3 VM's; one for Rancher, and two for Kubernetes. They have the following specs:

Instance Name VCPUs Disk RAM Time since created
onap_rancher_vm 2 40GB 4GB 6 days, 20 hours
onap-k8s-vm-2 16 100GB 64GB 6 days, 19 hours
onap-k8s-vm-1 16 100GB 64GB 6 days, 19 hours

I am following the quick start guide to install ONAP: https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_quickstart_guide.html. I am fairly new with OpenStack, Kubernetes, and ONAP.

After letting the install run for more than an hour, the following pods fail to become ready:

aaf-oauth
aaf-service
appc
brmsgw
clamp
clamp-dash-es
clamp-dash-kibana
dbc-pg
dcae-cloudify-manager
dcae-db
dmaap-bus-control
portal-app
sdc-be
sdc-be-config
sdc-es
sdc-es-config-elasticsearch
sdc-fe
sdc-kb
sdnc-db
sdnc-ueb-listener

The following is what I performed to bring the pods to a ready state:

aaf-oauth: removed health checks
aaf-service: removed health checks
appc: removed health checks
- the logs show the following:
- Warning Unhealthy 2m (x52 over 11m) kubelet, onap-k8s-vm-1 Readiness probe failed: APPC is not healthy.
  ++ ps -e
  ++ grep startODL
  ++ wc -l
  + startODL_status=1
  ++ /opt/opendaylight/current/bin/client bundle:list
  ++ grep Waiting
  ++ wc -l
  Failed to get the session.
  + waiting_bundles=0
  ++ /opt/opendaylight/current/bin/client system:start-level
  Failed to get the session.
  + run_level=
  + '[' '' == 'Level 100' ']'
  + echo APPC is not healthy.
  + exit 1
brmsgw: removed health checks
- the logs show the following:
- ARNING: jmx_fqdn= missing name or value
  WARNING: policy_msOnapName= missing name or value
  WARNING: policy_msPolicyName= missing name or value
  Starting configure of brmsgw under policy:policy ownership with umask 0022.
  WARNING: UEB_API_KEY= missing name or value
  WARNING: UEB_API_SECRET= missing name or value
  WARNING: BRMS_UEB_DELAY= missing name or value
  WARNING: BRMS_UEB_API_KEY= missing name or value
  WARNING: BRMS_UEB_API_SECRET= missing name or value
  WARNING: UEB_API_KEY= missing name or value
  WARNING: UEB_API_SECRET= missing name or value
  WARNING: BRMS_UEB_DELAY= missing name or value
  WARNING: BRMS_UEB_API_KEY= missing name or value
  WARNING: BRMS_UEB_API_SECRET= missing name or value
clamp: removed health checks
clamp-dash-es: deleted the pod and upon restarting it became ready after removing health checks for clamp
clamp-dash-kibana: became ready after removing health checks for clamp
dbc-pg: removed health checks
dcae-cloudify-manager: removed health checks
dcae-db: removed health checks
dmaap-bus-control: became ready after removing health checks for dbc-pg
portal-app: removed init-container
- the init-container relies on a pod created by a job portal-db-config, which relies on portal-db
- logs for portal-db:
  - 2018-07-06 17:15:04 140190038890368 [Note] mysqld (mysqld 10.2.15-MariaDB-10.2.15+maria~jessie) starting as process 1 ...
    2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
    2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Uses event mutexes
    2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Compressed tables use zlib 1.2.8
    2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Using Linux native AIO
    2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Number of pools: 1
    2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Using SSE2 crc32 instructions
    2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
    2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Completed initialization of buffer pool
    2018-07-06 17:15:04 140189302863616 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
    2018-07-06 17:15:06 140190038890368 [Note] InnoDB: Highest supported file format is Barracuda.
    2018-07-06 17:15:18 140190038890368 [Note] InnoDB: 128 out of 128 rollback segments are active.
    2018-07-06 17:15:18 140190038890368 [Note] InnoDB: Creating shared tablespace for temporary tables
    2018-07-06 17:15:18 140190038890368 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
    2018-07-06 17:15:18 140190038890368 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
    2018-07-06 17:15:18 140190038890368 [Note] InnoDB: 5.7.22 started; log sequence number 3730924
    2018-07-06 17:15:18 140188923713280 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
    2018-07-06 17:15:18 140190038890368 [Note] Plugin 'FEEDBACK' is disabled.
    2018-07-06 17:15:19 140190038890368 [Note] Server socket created on IP: '::'.
    2018-07-06 17:15:20 140190038890368 [Warning] 'proxies_priv' entry '@% root@dev-portal-db-84f48ddccb-cslbv' ignored in --skip-name-resolve mode.
    2018-07-06 17:15:21 140190038890368 [Note] Reading of all Master_info entries succeded
    2018-07-06 17:15:21 140190038890368 [Note] Added new Master_info '' to hash table
    2018-07-06 17:15:21 140190038890368 [Note] mysqld: ready for connections.
    Version: '10.2.15-MariaDB-10.2.15+maria~jessie' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
    2018-07-06 17:15:24 140188923713280 [Note] InnoDB: Buffer pool(s) load completed at 180706 17:15:24
    2018-07-06 17:35:13 140189880547072 [Warning] Aborted connection 51 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 17:35:13 140189883221760 [Warning] Aborted connection 50 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 17:35:13 140189283657472 [Warning] Aborted connection 54 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 17:35:13 140189283960576 [Warning] Aborted connection 53 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 17:35:18 140189880850176 [Warning] Aborted connection 49 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 18:35:13 140189283657472 [Warning] Aborted connection 788 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 18:35:13 140189880547072 [Warning] Aborted connection 786 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 18:35:13 140189283354368 [Warning] Aborted connection 784 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 18:35:13 140189880850176 [Warning] Aborted connection 785 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
    2018-07-06 18:35:13 140189883221760 [Warning] Aborted connection 787 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
sdc-be: cannot resolve; depends on sdc-es-config-elasticsearch
sdc-be-config: cannot resolve; depends on sdc-be
sdc-es: removed health check
sdc-es-config-elasticsearch: crashes with error in logs:
- [2018-07-06T18:32:55+00:00] INFO: Retrying execution of ruby_block[check_ElasticSearch_Cluster_Health], 0 attempt(s) left
  
  ================================================================================
  Error executing action `run` on resource 'ruby_block[check_ElasticSearch_Cluster_Health]'
  ================================================================================
  
  Errno::ECONNREFUSED
  -------------------
  Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)
  
  Cookbook Trace:
  ---------------
  /root/chef-solo/cache/cookbooks/init-sdc-elasticsearch/recipes/ES_1_create_audit_template.rb:9:in `block (2 levels) in from_file'
  ...
  System Info:
  ------------
  chef_version=13.8.5
  platform=alpine
  platform_version=3.7.0
  ruby=ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-linux-musl]
  program_name=chef-solo worker: ppid=7;start=18:31:15;
  executable=/usr/bin/chef-solo
  
  [2018-07-06T18:32:57+00:00] INFO: Running queued delayed notifications before re-raising exception
  [2018-07-06T18:32:57+00:00] INFO: Running queued delayed notifications before re-raising exception
  Running handlers:
  [2018-07-06T18:32:57+00:00] ERROR: Running exception handlers
  [2018-07-06T18:32:57+00:00] ERROR: Running exception handlers
  Running handlers complete
  [2018-07-06T18:32:57+00:00] ERROR: Exception handlers complete
  [2018-07-06T18:32:57+00:00] ERROR: Exception handlers complete
  Chef Client failed. 0 resources updated in 01 minutes 41 seconds
  [2018-07-06T18:32:57+00:00] FATAL: Stacktrace dumped to /root/chef-solo/cache/chef-stacktrace.out
  [2018-07-06T18:32:57+00:00] FATAL: Stacktrace dumped to /root/chef-solo/cache/chef-stacktrace.out
  [2018-07-06T18:32:57+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
  [2018-07-06T18:32:57+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
  [2018-07-06T18:32:57+00:00] ERROR: ruby_block[check_ElasticSearch_Cluster_Health] (init-sdc-elasticsearch::ES_1_create_audit_template line 4) had an error: Errno::ECONNREFUSED: Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)
  [2018-07-06T18:32:57+00:00] ERROR: ruby_block[check_ElasticSearch_Cluster_Health] (init-sdc-elasticsearch::ES_1_create_audit_template line 4) had an error: Errno::ECONNREFUSED: Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)
  [2018-07-06T18:32:57+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
  [2018-07-06T18:32:57+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
sdc-fe: relies on sdc-kb and sdc-be-config-backend job
sdc-kb: relies on sdc-es-config-elasticsearch
sdnc-db: became ready after removing health check for sdc-es
sdnc-ueb-listener: relies on sdnc-be

The following jobs fail to succeed, and fail to try again after 7 retries:

sdc-be-config-backend: depends on sdc-be
sdc-es-config-elastic-search: depends on sdc-be
portal-db-config: relies on portal-db

The pods which I removed the health check reported that the connection to the port had been refused and could not sync. The other pods which I could not resolve to a ready state or removed the init containers rely on other pods, which I have provided the logs for. The system does not pass the Consul health checks nor the robot check.

I should note that I have tried increasing the time for the health check probes, but it has not resolved my issues.

I also have a few questions regarding OpenStack configuration. I have a feeling that I am not configuring the ./onap/values.yaml file correctly, specifically for appc, nbi, and so.

For appc:

What is openStack Type and Name? How do I determine these values, in Horizon or the CLI?
Do I need to create another user in OpenStack? I am currently using the admin credentials.
Is the TenantName the equivalent to the Project name?
Is the KeystoneUrl the equivalent to the identity service URL?

For nbi:

Is the VNFTenantID the User ID for the admin user? I am also currently using the admin user.

For so:

Do I need to change dmaapTopic? I am currently using "AUTO" (the default)
Similar to the above questions, should I use the admin account? And is the KeyStoneUrl the same as the one for appc?
Do I need to change the mariadbRootPassword?

Sorry for the long post and many questions. Thank you so much!

Beijing install pods fails to start and OpenStack configuration

2 answers