Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Another feature that may assist in achieving a repeatable deployment in the presence of faults that may have reduced the capacity of the cloud is assigning priority to the containers such that mission critical components have the ability to evict less critical components.  Kubernetes provides this capability with Pod Priority and Preemption.

Prior to having more advantage advanced carrier grade features available, the ability to at least be able to re-deploy ONAP (or a subset of) reliably provides a level of confidence that should an outage occur the system can be brought back on-line predictably.

...

A critical factor in being able to recover from an ONAP outage is to ensure that critical state isn't lost after a failure.  Much like ephemeral storage on VMs; any state information stored within a container will be lost once the container is restarted - containers are managed as Cattle, not Pets. To ensure that critical state information is retained after a failure, the OOM deployment specifications for the ONAP components use the Kubernetes concept of a of Persistent Volumes, an external storage facility that has its own lifecycle. The use of a persistent volume is specified in the ONAP deployment specifications.  Here is an example from the sdnc db-deployment.yaml:

...

All highly available systems include at least one facility to monitor the health of components within the system.  Such health monitors are often used as inputs to distributed coordination systems (such as etcdzookeeper, or consul) and monitoring systems (such as nagios or zabbix).  Within ONAP Consul is the monitoring system of choice and deployed by OOM in two parts.  A three-way, centralized Consul server cluster is deployed as a highly available monitor of all of the ONAP components.  The Consul server provides an a user interface that allows a user to graphically view the current health status of all of the ONAP components for which agents have been created - a sample from the ONAP Integration labs follows.  Monitoring of ONAP components is configured in the agents within JSON files and stored in gerrit under the consul-agent-config.

...

Initially the Consul agents are using the same health monitoring facilities as the robot test infrastructure which are typically just validating that the end-point if is reachable.  Some health checks already support more advanced checking - such as validating that a database is able to create, update and delete an entry. Consul exposes an API that allows external agents to use the results of the health check, such as the Kubernetes "liveness" probes described below.  

...

OOM deploys ONAP with Kubernetes as described defined by deployment specifications as described mentioned earlier.  These same deployment specifications are also used to implement automatic recoverability of ONAP components when individual components fail. Once ONAP is deployed, a "liveness" probe starts checking the health of the components after a specified startup time.  These liveness probes can simply check that a port is available, that a built-in health check is reporting good health, or that the Consul health check is positive. Should a liveness probe indicate a failed container it will be restarted as described in the deployment specification.  Should the deployment specification indicate that there are one or more dependencies to this container or component (for example a dependency on a database) the dependency will be satisfied before the container/component is restarted. This mechanism ensures that, after a failure, all of the ONAP components restart successfully.  Note that, during the Amsterdam release, deployment specification specifications were created for all ONAP component components but not all of these deployment specifications are restartable (idempotent) so further .  Further work is required during the Beijing release to ensure recoverability of all the ONAP components.

...

Filebeat collects logs from within the namespace of each component and ships them to the centralized logging stack that where was deployed with by OOM with the other ONAP components.  Users are able to point their web browsers to the Kibana component and see all of the raw logs as well as predefined dashboards that show the state of ONAP in real-time. 

...

The OOM project is not responsible to creating highly available versions of all of the ONAP components but does provide via Kubernetes many built in facilities to build clustered, highly available systems including: Services with load-balancers (including support for External Load Balancers), Ingress Resources, and Replica Sets. Some of the open-source projects that form the basis of ONAP components directly support clustered configurations like ODL with instructions on Setting Up Clustering or MaridDB MariaDB Getting Started with MariaDB Galera Cluster. .  

OOM uses the Kubernetes service abstraction to provide a consistent access point for each of the ONAP components, independent of the pod or container architecture of that component.  For example, the SDNC SDN-C component may introduce OpenDaylight clustering as some point and change the number of pods in this component to three or more, but this change will be isolated from the other ONAP components by the service abstraction.  A service can include a load balancer on its ingress to distribute traffic between the pods and even react to dynamic changes in the number of pods if they are part of a replica set. A replica set is a construct that is used to describe the desired state of the cluster.  For example 'replicas: 3' indicates to Kubernetes that a cluster of 3 instances is the desired state.  Should one of the members of the cluster fail, a new member will be automatically started to replace it.

...