You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Project Name: CHAP- Common High-Availability Platform 

Project description:

To achieve 5 9s of availability on 3 9s (or lower) software and infrastructure in a cost-effective manner, ONAP components need to support multi-site reliability with efficient failover. This has been identified as a Level 3 requirement for Carrier grade reliability that needs to be part of the R2 release. This is an important and challenging problem because of three fundamental reasons:

  • Across geo-distributed sites (e.g., Beijing, Amsterdam and Irvine) WAN latencies are much higher and frequent network partitions can occur. Hence current solutions for replicating ONAP components and their state like MariaDB clustering, that work very effectively within a site, may not scale across geo-distributed sites.  

  • The resiliency protocols for failure detection, failover, request federation etc, especially across multiple-sites, involves complex distributed system protocols replete with corner cases leading to split-brain problems and network partitions. Currently, each component is building its own handcrafted solution which is  wasteful and worse, can be erroneous.

  • ONAP components often have a diverse range of requirements in terms of replication and resiliency. While some components need to carefully manage state across replicas, others may be stateless. Similarly, some of the ONAP components have strict requirements in terms of how the load should be shared across replicas.

In this project, we  identify common availability and resiliency problems across ONAP components and provide a high-availability platform (CHAP) with shared services that each ONAP component can simply configure and use to achieve multi-site reliability with efficient failover.

Functionality:

CHAP provides two core functionalities:

  • MUSIC/mdbc for efficient state/data replication across geo-distributed sites: ONAP components can maintain and access geo-distributed state in a shared service called MUSIC (multi-site coordination service) that maintains state in a highly scalable replicated key-value store (Cassandra) but also provides a locking service (built on Zookeeper) on top of it through which the ONAP components can obtain stronger consistency on the shared state only when required

    CHAP also provides a recipe on top of MUSIC, called mdbc (multi-site DB cache) that enable ONAP components that maintain state in a SQL database to avail the benefits of MUSIC without changing their SQL DB code. These ONAP components can rely on existing db clustering techniques like MariaDB clustering for replication within a site. mdbc will intercept each of these read/write calls to the db cluster and mirror this state to other geo-distributed sites through MUSIC.

  • HAL for customizable, consistent failover across geo-distributed sites:  While MUSIC can ensure that state is available across sites, if a site with an ONAP component fails HAL is a cross-site failover service that provides the necessary logic to perform failover to one of the other sites:

    • Customizable logic for replica-selection and request-redistribution: ONAP components can configure HAL with different policies to decide which of the remaining replicas on other sites must take over on failure of a site and how requests that were handled by the failed site must be redistributed across the remaining replicas.

    • Consistent view of state on failover: Once the new site/replicas has been selected for failover, HAL, through its interaction with MUSIC, ensures that the ONAP component on that site has the up-to-date view (if required) of the state.


Scope:

CHAP is a common platform that each ONAP component can use for  cross-site redundancy and failover. While its intended granularity is at the ONAP component level, it will work well even for micro-services. 

Out of scope:

  • While CHAP has the necessary technology to enable state replication and failover within a site, our intended scope for CHAP is strictly for cross-site state replication and failover. We assume that each ONAP component will take care of redundancy within a site. 
  • While CHAP has the necessary technology to perform failure detection of sites/onap components, in alignment with the larger ONAP architecture, we consider that the responsibility of OOM.  
  • CHAP is not intended to failover the entire ONAP eco-system as a whole and this too is the responsibility of OOM. CHAP's intended use is to enable the individual components failover in a correct, efficient manner across sites. 

Usage:

  • MUSIC will be maintained as a shared service exporting REST end-points that can be used across ONAP components. ONAP components can directly use the REST API provided by MUSIC  (like ONAP HAS does now) for state-management. 

  • If an ONAP component maintains state in a SQL database (and want to continue doing so), it simply needs to replace their existing jdbc driver with mdbc driver. 
  • The HAL daemon is expected to run as a companion daemon to the ONAP component at each site, ideally in the same failure domain the ONAP component (i.e, they fail together). Hence, it should be a process in the same VM or container of the ONAP component. While HAL currently exports a REST API we are working on a java library that components can simply link to their code.
  • ONAP components need to provide HAL with (1) failover scripts that HAL will call when an ONAP component on a site is taking over requests corresponding to the failed component. This could involve actions such as updating MSB or DMaap that are very component-specific. (2) site-selection/request-redistribution scripts that HAL will use to decide which site (and the ONAP component) must take-over which request on failure. For example a simple policy could be: "the least busy site must take over all requests that were assigned to the failed ONAP component". 


Architecture:

The figures below (The CHAP components are highlighted in green) describe how CHAP can be used in a general context and also provide a  specific example of its potential usage in ONAP MSO. 





A specific example:



Another prism through which one can view CHAP is that it enables ONAP components to achieve carrier grade requirements of reliability as shown in the following figure:


Current Status:

  • All three components of CHAP have been open sourced: MUSIC, mdbc, HAL.

  • OOF-Homing Optimizer (HAS) uses CHAP for its state persistence and as a highly available distributed messaging service. This is currently being run in production within ATT


 Architecture Alignment:

    • How does this project fit into the rest of the ONAP Architecture?

    CHAP will be available as a common service like DMaap or AAF as shown in the red, oblong box below:

    • What other ONAP projects does this project depend on?

    Since OOM is responsible for the life-cycle of ONAP components, it will also need to manage the deployment of CHAP. Further CHAP (HAL in particular) needs to interact with OOM to perform failover when OOM detects a site-failure. 


    • How does this align with external standards/specifications?

    Among the components of CHAP, MUSIC and HAL export a REST API while mdbc is to be used as a jdbc driver.


    • Are there dependencies with other open source projects?

    CHAP depends primarily on Apache Cassandra and Zookeeper.


Other Information:

  • link to seed code (if applicable)
  • Vendor Neutral
    • if the proposal is coming from an existing proprietary codebase, have you ensured that all proprietary trademarks, logos, product names, etc., have been removed?
  • Meets Board policy (including IPR)

Use the above information to create a key project facts section on your project page

Key Project Facts:

Facts

Info

PTL (first and last name)

Bharath Balasubramanian

Jira Project NameCHAP
Jira Key
Project ID
Link to Wiki Space

Release Components Name:

Note: refer to existing project for details on how to fill out this table

Components Name

Components Repository name

Maven Group ID

Components Description

CHAP
org.onap.




Resources committed to the Release:

Note 1: No more than 5 committers per project. Balance the committers list and avoid members representing only one company.

Note 2: It is critical to complete all the information requested, that will help to fast forward the onboarding process.

Role

First Name Last Name

Linux Foundation ID

Email Address

Location

PTLBharath Balasubramanianbharathbbharathb@research.att.comBedminster, NJ, USA
Committers




Brendan Tschaenbptschaenbt054f@att.comBedminster, NJ, USA










Contributors













  • No labels