Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Project Name: CHAP- Common High-Availability Platform 

Project description:

To achieve 5 9s of availability on 3 9s (or lower) software and infrastructure in a cost-effective manner, ONAP components need to support multi-site reliability with efficient failover. This has been identified as a Level 3 requirement for Carrier grade reliability that needs to be part of the R2 release. This is an important and challenging problem because of three fundamental reasons:

  • Across geo-distributed sites (e.g., Beijing, Amsterdam and Irvine) WAN latencies are much higher and frequent network partitions can occur. Hence current solutions for replicating ONAP components and their state like MariaDB clustering, that work very effectively within a site, may not scale across geo-distributed sites.  

  • The resiliency protocols for failure detection, failover, request federation etc, especially across multiple-sites, involves complex distributed system protocols replete with corner cases leading to split-brain problems and network partitions. Currently, each component is building its own handcrafted solution which is  wasteful and worse, can be erroneous.

  • ONAP components often have a diverse range of requirements in terms of replication and resiliency. While some components need to carefully manage state across replicas, others may be stateless. Similarly, some of the ONAP components have strict requirements in terms of how the load should be shared across replicas.

In this project, we  identify common availability and resiliency problems across ONAP components and provide a high-availability platform (CHAP) with shared servicesthat each ONAP component can simply configure and use to achieve multi-site reliability with efficient failover.

Scope:

CHAP will provide shared services that each ONAP component can use to avail geo-redundancy. While it can be used even within a site, its expected scope is for intra-component multi-site geo-redundancy and high-availability.

Functionality:

CHAP provides three shared services, any of which can be used by ONAP components:

  1. A Multi-site coordination (MUSIC) service in which ONAP components can maintain and access geo-distributed state. MUSIC maintains state in a highly scalable key-value store (Cassandra) but also provides a locking service (built on Zookeeper) on top of it through which  the ONAP components can obtain stronger consistency on the shared state. MUSIC exports a REST API that can be used by the ONAP components.

  2. A multi-site DB cache (mdbc)  that allows ONAP components that maintain state in a database to avail the benefits of MUSIC without changing their SQL DB code. While some ONAP components may want to use MUSIC directly to maintain state, others may already maintain state in a SQL database. For such db-based ONAP components, within a site, these ONAP components can rely on existing db clustering techniques like MariaDB clustering. mdbc will intercept each of these read/write calls to the db and mirror this state to other geo-distributed sites through MUSIC. mdbc is implemented as a jdbc driver and ONAP components can simply replace their existing jdbc driver with the mdbc driver to avail its benefits of geo-redundancy.

  3. Configurable high-availability (HAL) recipes for distributed failure-detection, failover, leader-election, and request re-distribution. HAL is expected to run as a companion daemon to each ONAP components replica on each site in a geo-distributed set-up. The HAL daemons across the sites, working together can perform several functions: (i) detect failures in their companion ONAP component replicas and attempt to bring them back (ii) detect failures in other HAL daemons (iii) perform leader-election and failover in case of replica or entire -site failures. (iv) redistribute requests of the failed ONAP component replica to other replicas through a policy-driven optimization framework. Internally, HAL uses MUSIC to maintain its own state. HAL exports a REST API and the ONAP components can simply configure HAL according to their own needs by providing keep-alive scripts and request re-routing policies.

Usage Diagrams:

The figures below (The CHAP components are highlighted in green) describe how CHAP can be used in a general context and also provide a  specific example of its potential usage in ONAP MSO:

  • The usage pattern echoes a common resiliency pattern in ONAP wherein each component is replicated within the site and across sites.

  • For geo-redundant state persistence the ONAP component can directly use the REST API provided by MUSIC in CHAP (like ONAP HAS does now). However, if it currently uses a database for state persistence, then it can use the mdbc driver which will intercept its local calls to its database and copy it to MUSIC. This is shown in the second figure for how MSO, which maintains state in a H2 database may use it.

  • Each ONAP component replicas runs a companion HAL daemon (halD) that provides health checks, failure detection and failover as mentioned before.


General Usage:


A specific example:




Another prism through which one can view CHAP is that it enables ONAP components to achieve carrier grade requirements of reliability as shown in the following figure:

Current Status

  • All three components of CHAP have been open sourced: MUSIC, mdbc, HAL.


  • OOF-Homing Optimizer (HAS) uses CHAP for its state persistence and as a highly available distributed messaging service. This is currently being run in production within ATT


 Architecture Alignment:

    • How does this project fit into the rest of the ONAP Architecture?

    CHAP will be available as a common service like DMaap or AAF as shown in the red, oblong box below:

    • What other ONAP projects does this project depend on?

    Since OOM is responsible for the life-cycle of ONAP components, it will also need to manage the deployment of CHAP.


    • How does this align with external standards/specifications?

    Among the components of CHAP MUSIC and HAL export a REST API while mdbc is to be used as a jdbc driver.


    • Are there dependencies with other open source projects?

    CHAP depends primarily on Apache Cassandra and Zookeeper.


Other Information:

  • link to seed code (if applicable)
  • Vendor Neutral
    • if the proposal is coming from an existing proprietary codebase, have you ensured that all proprietary trademarks, logos, product names, etc., have been removed?
  • Meets Board policy (including IPR)

Use the above information to create a key project facts section on your project page

Key Project Facts:

Facts

Info

PTL (first and last name)

Bharath Balasubramanian

Jira Project NameCHAP
Jira Key
Project ID
Link to Wiki Space

Release Components Name:

Note: refer to existing project for details on how to fill out this table

Components Name

Components Repository name

Maven Group ID

Components Description

CHAP
org.onap.




Resources committed to the Release:

Note 1: No more than 5 committers per project. Balance the committers list and avoid members representing only one company.

Note 2: It is critical to complete all the information requested, that will help to fast forward the onboarding process.

Role

First Name Last Name

Linux Foundation ID

Email Address

Location

PTLBharath Balasubramanianbharathbbharathb@research.att.comBedminster, NJ, USA
Committers


















Contributors