2
1
0

Hello,

I am trying to install the Beijing release on OpenStack with Kubernetes. I have 3 VM's; one for Rancher, and two for Kubernetes. They have the following specs:

Instance Name VCPUs Disk RAM Time since created
onap_rancher_vm 2 40GB 4GB 6 days, 20 hours
onap-k8s-vm-2 16 100GB 64GB 6 days, 19 hours
onap-k8s-vm-1 16 100GB 64GB 6 days, 19 hours

I am following the quick start guide to install ONAP: https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_quickstart_guide.html. I am fairly new with OpenStack, Kubernetes, and ONAP.

After letting the install run for more than an hour, the following pods fail to become ready:

  • aaf-oauth
  • aaf-service
  • appc
  • brmsgw
  • clamp
  • clamp-dash-es
  • clamp-dash-kibana
  • dbc-pg
  • dcae-cloudify-manager
  • dcae-db
  • dmaap-bus-control
  • portal-app
  • sdc-be

  • sdc-be-config

  • sdc-es

  • sdc-es-config-elasticsearch

  • sdc-fe

  • sdc-kb

  • sdnc-db

  • sdnc-ueb-listener

The following is what I performed to bring the pods to a ready state:

  • aaf-oauth: removed health checks
  • aaf-service: removed health checks
  • appc: removed health checks
    • the logs show the following:
    • Warning Unhealthy 2m (x52 over 11m) kubelet, onap-k8s-vm-1 Readiness probe failed: APPC is not healthy.
      ++ ps -e
      ++ grep startODL
      ++ wc -l
      + startODL_status=1
      ++ /opt/opendaylight/current/bin/client bundle:list
      ++ grep Waiting
      ++ wc -l
      Failed to get the session.
      + waiting_bundles=0
      ++ /opt/opendaylight/current/bin/client system:start-level
      Failed to get the session.
      + run_level=
      + '[' '' == 'Level 100' ']'
      + echo APPC is not healthy.
      + exit 1

  • brmsgw: removed health checks
    • the logs show the following:
    • ARNING: jmx_fqdn= missing name or value
      WARNING: policy_msOnapName= missing name or value
      WARNING: policy_msPolicyName= missing name or value
      Starting configure of brmsgw under policy:policy ownership with umask 0022.
      WARNING: UEB_API_KEY= missing name or value
      WARNING: UEB_API_SECRET= missing name or value
      WARNING: BRMS_UEB_DELAY= missing name or value
      WARNING: BRMS_UEB_API_KEY= missing name or value
      WARNING: BRMS_UEB_API_SECRET= missing name or value
      WARNING: UEB_API_KEY= missing name or value
      WARNING: UEB_API_SECRET= missing name or value
      WARNING: BRMS_UEB_DELAY= missing name or value
      WARNING: BRMS_UEB_API_KEY= missing name or value
      WARNING: BRMS_UEB_API_SECRET= missing name or value

  • clamp: removed health checks

  • clamp-dash-es: deleted the pod and upon restarting it became ready after removing health checks for clamp
  • clamp-dash-kibana: became ready after removing health checks for clamp
  • dbc-pg: removed health checks
  • dcae-cloudify-manager: removed health checks
  • dcae-db: removed health checks
  • dmaap-bus-control: became ready after removing health checks for dbc-pg
  • portal-app: removed init-container
    • the init-container relies on a pod created by a job portal-db-config, which relies on portal-db
    • logs for portal-db:
      • 2018-07-06 17:15:04 140190038890368 [Note] mysqld (mysqld 10.2.15-MariaDB-10.2.15+maria~jessie) starting as process 1 ...
        2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
        2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Uses event mutexes
        2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Compressed tables use zlib 1.2.8
        2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Using Linux native AIO
        2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Number of pools: 1
        2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Using SSE2 crc32 instructions
        2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
        2018-07-06 17:15:04 140190038890368 [Note] InnoDB: Completed initialization of buffer pool
        2018-07-06 17:15:04 140189302863616 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
        2018-07-06 17:15:06 140190038890368 [Note] InnoDB: Highest supported file format is Barracuda.
        2018-07-06 17:15:18 140190038890368 [Note] InnoDB: 128 out of 128 rollback segments are active.
        2018-07-06 17:15:18 140190038890368 [Note] InnoDB: Creating shared tablespace for temporary tables
        2018-07-06 17:15:18 140190038890368 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
        2018-07-06 17:15:18 140190038890368 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
        2018-07-06 17:15:18 140190038890368 [Note] InnoDB: 5.7.22 started; log sequence number 3730924
        2018-07-06 17:15:18 140188923713280 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
        2018-07-06 17:15:18 140190038890368 [Note] Plugin 'FEEDBACK' is disabled.
        2018-07-06 17:15:19 140190038890368 [Note] Server socket created on IP: '::'.
        2018-07-06 17:15:20 140190038890368 [Warning] 'proxies_priv' entry '@% root@dev-portal-db-84f48ddccb-cslbv' ignored in --skip-name-resolve mode.
        2018-07-06 17:15:21 140190038890368 [Note] Reading of all Master_info entries succeded
        2018-07-06 17:15:21 140190038890368 [Note] Added new Master_info '' to hash table
        2018-07-06 17:15:21 140190038890368 [Note] mysqld: ready for connections.
        Version: '10.2.15-MariaDB-10.2.15+maria~jessie' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
        2018-07-06 17:15:24 140188923713280 [Note] InnoDB: Buffer pool(s) load completed at 180706 17:15:24
        2018-07-06 17:35:13 140189880547072 [Warning] Aborted connection 51 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 17:35:13 140189883221760 [Warning] Aborted connection 50 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 17:35:13 140189283657472 [Warning] Aborted connection 54 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 17:35:13 140189283960576 [Warning] Aborted connection 53 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 17:35:18 140189880850176 [Warning] Aborted connection 49 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 18:35:13 140189283657472 [Warning] Aborted connection 788 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 18:35:13 140189880547072 [Warning] Aborted connection 786 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 18:35:13 140189283354368 [Warning] Aborted connection 784 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 18:35:13 140189880850176 [Warning] Aborted connection 785 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)
        2018-07-06 18:35:13 140189883221760 [Warning] Aborted connection 787 to db: 'ecomp_sdk' user: 'root' host: '10.42.227.137' (Got timeout reading communication packets)

  • sdc-be: cannot resolve; depends on sdc-es-config-elasticsearch

  • sdc-be-config: cannot resolve; depends on sdc-be

  • sdc-es: removed health check

  • sdc-es-config-elasticsearch: crashes with error in logs:

    • [2018-07-06T18:32:55+00:00] INFO: Retrying execution of ruby_block[check_ElasticSearch_Cluster_Health], 0 attempt(s) left

      ================================================================================
      Error executing action `run` on resource 'ruby_block[check_ElasticSearch_Cluster_Health]'
      ================================================================================

      Errno::ECONNREFUSED
      -------------------
      Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)

      Cookbook Trace:
      ---------------
      /root/chef-solo/cache/cookbooks/init-sdc-elasticsearch/recipes/ES_1_create_audit_template.rb:9:in `block (2 levels) in from_file'

      ...

      System Info:
      ------------
      chef_version=13.8.5
      platform=alpine
      platform_version=3.7.0
      ruby=ruby 2.4.4p296 (2018-03-28 revision 63013) [x86_64-linux-musl]
      program_name=chef-solo worker: ppid=7;start=18:31:15;
      executable=/usr/bin/chef-solo

      [2018-07-06T18:32:57+00:00] INFO: Running queued delayed notifications before re-raising exception
      [2018-07-06T18:32:57+00:00] INFO: Running queued delayed notifications before re-raising exception
      Running handlers:
      [2018-07-06T18:32:57+00:00] ERROR: Running exception handlers
      [2018-07-06T18:32:57+00:00] ERROR: Running exception handlers
      Running handlers complete
      [2018-07-06T18:32:57+00:00] ERROR: Exception handlers complete
      [2018-07-06T18:32:57+00:00] ERROR: Exception handlers complete
      Chef Client failed. 0 resources updated in 01 minutes 41 seconds
      [2018-07-06T18:32:57+00:00] FATAL: Stacktrace dumped to /root/chef-solo/cache/chef-stacktrace.out
      [2018-07-06T18:32:57+00:00] FATAL: Stacktrace dumped to /root/chef-solo/cache/chef-stacktrace.out
      [2018-07-06T18:32:57+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
      [2018-07-06T18:32:57+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
      [2018-07-06T18:32:57+00:00] ERROR: ruby_block[check_ElasticSearch_Cluster_Health] (init-sdc-elasticsearch::ES_1_create_audit_template line 4) had an error: Errno::ECONNREFUSED: Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)
      [2018-07-06T18:32:57+00:00] ERROR: ruby_block[check_ElasticSearch_Cluster_Health] (init-sdc-elasticsearch::ES_1_create_audit_template line 4) had an error: Errno::ECONNREFUSED: Failed to open TCP connection to sdc-es.onap:9200 (Connection refused - connect(2) for "sdc-es.onap" port 9200)
      [2018-07-06T18:32:57+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
      [2018-07-06T18:32:57+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

  • sdc-fe: relies on sdc-kb and sdc-be-config-backend job

  • sdc-kb: relies on sdc-es-config-elasticsearch

  • sdnc-db: became ready after removing health check for sdc-es

  • sdnc-ueb-listener: relies on sdnc-be

The following jobs fail to succeed, and fail to try again after 7 retries:

  • sdc-be-config-backend: depends on sdc-be
  • sdc-es-config-elastic-search: depends on sdc-be
  • portal-db-config: relies on portal-db

The pods which I removed the health check reported that the connection to the port had been refused and could not sync. The other pods which I could not resolve to a ready state or removed the init containers rely on other pods, which I have provided the logs for. The system does not pass the Consul health checks nor the robot check.

I should note that I have tried increasing the time for the health check probes, but it has not resolved my issues.



I also have a few questions regarding OpenStack configuration. I have a feeling that I am not configuring the ./onap/values.yaml file correctly, specifically for appc, nbi, and so.

For appc:

  • What is openStack Type and Name? How do I determine these values, in Horizon or the CLI?
  • Do I need to create another user in OpenStack? I am currently using the admin credentials.
  • Is the TenantName the equivalent to the Project name?
  • Is the KeystoneUrl the equivalent to the identity service URL?

For nbi:

  • Is the VNFTenantID the User ID for the admin user? I am also currently using the admin user.

For so:

  • Do I need to change dmaapTopic? I am currently using "AUTO" (the default)
  • Similar to the above questions, should I use the admin account? And is the KeyStoneUrl the same as the one for appc?
  • Do I need to change the mariadbRootPassword?

Sorry for the long post and many questions. Thank you so much!

    CommentAdd your comment...

    2 answers

    1.  
      1
      0
      -1

      Marcus, I have experienced the same problem with jobs. Seems that there is limited time during job execution are retrying waiting for their dependencies and later can be that installation get stuck without manual intervention. Some pods wait for e.g. database schema to be applied which is done only by one of these jobs

      To solve it I exported every failed job definition to yaml file. Removed from that file two lines with "controller-uid" , then removed the job with unsuccessful status from kubernetes (kubectl delete job failedjob) and created them back again (kubectl create -f jobdefinition.yaml ) .

      When that jobs had been finished successfuly next part of installation went further.


        CommentAdd your comment...
      1.  
        1
        0
        -1

        Marcus, this is very good triage work - I think it deserves it's own JIRA EPIC or wiki page and onap-discuss article - anyway I think each app specific workaround/issue would need a subtask - I'll link to this work where applicable for some shared issues bringing the system up.  On the 11th of June I was able to get everything except 3 pods up on a clean system (AWS this time) but used an overkill cluster with 4 vms at 16 cores and 122g ram each with 20Gbps network links - a portion of these are random timing related.

        /michael

        1. Marcus D

          Hi Michael,

          Is there documentation regarding what configurations I should use for the OpenStack options in the values.yaml file? I would like to confirm that sharing the admin user among all of these roles is acceptable.

          Thanks,

          Marcus

        2. Gülsüm Atıcı

          Hello,

          In  /oom/kubernetes/onap/values.yaml  file;  3  modules(appc, nbi, so ) includes  the  openstack  related  parameters. Which   parameters  are  required  to be  customized  and  how  can  we  do  it ?

          I couln't find clear  explanation  regarding  with this  file.   It would  be  very  useful if  there is  any example  configuration.

          Thanks a  lot.


          appc:

          enabled: true
          config:
          openStackType: OpenStackProvider
          openStackName: OpenStack
          openStackKeyStoneUrl: http://localhost:8181/apidoc/explorer/index.html
          openStackServiceTenantName: default
          openStackDomain: default
          openStackUserName: admin
          openStackEncryptedPassword: admin



          nbi:
          enabled: true
          config:
          # openstack configuration
          openStackRegion: "Yolo"
          openStackVNFTenantId: "1234"



          so:
          enabled: true

          replicaCount: 1

          liveness:
          # necessary to disable liveness probe when setting breakpoints
          # in debugger so K8s doesn't restart unresponsive container
          enabled: true

          # so server configuration
          config:
          # message router configuration
          dmaapTopic: "AUTO"
          # openstack configuration
          openStackUserName: "vnf_user"
          openStackRegion: "RegionOne"
          openStackKeyStoneUrl: "http://1.2.3.4:5000"
          openStackServiceTenantName: "service"
          openStackEncryptedPasswordHere: "b51fd164d68bdf2ef9fr94gvtrlk4jkgbr"

          # configure embedded mariadb
          mariadb:
          config:
          mariadbRootPassword: password

        CommentAdd your comment...