Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this project, we  identify common availability and resiliency problems across ONAP components and provide a high-availability platform (CHAP) with shared servicesthat each ONAP component can simply configure and use to achieve multi-site reliability with efficient failover.

Scope:

CHAP will provide shared services that each ONAP component can use to avail geo-redundancy. While it can be used even within a site, its expected scope is for intra-component multi-site geo-redundancy and high-availability.

Functionality:

CHAP provides three shared services, any of which can be used by ONAP components:

...

two core functionalities:

...

  • MUSIC/mdbc for efficient state/data replication across geo-distributed sites:
  • ONAP components can maintain and access geo-distributed state

...

  • in a shred service called MUSIC (multi-site coordination service) that maintains state in a highly scalable replicated key-value store (Cassandra) but also provides a locking service (built on Zookeeper) on top of it through which

...

  • the ONAP components can obtain stronger consistency on the shared state only when required.

...

  •  

    CHAP also provides a recipe on top of MUSIC, called mdbc (multi-site DB cache

...

  • )

...

  • that enable ONAP components that maintain state in a SQL database to avail the benefits of MUSIC without changing their SQL DB code.

...

  • These ONAP components can rely on existing db clustering techniques like MariaDB clustering for replication within a site. mdbc will intercept each of these read/write calls to the db cluster and mirror this state to other geo-distributed sites through MUSIC.

...



  • HAL for customizable, consistent failover across geo-distributed sites:  While MUSIC can ensure that state is available across sites, if a site with an ONAP component fails (as detected by OOM), HAL provides the necessary logic to perform failover to one of the other sites:

    • Customizable logic for site-selection and request-redistribution: ONAP components can configure HAL with different policies to decide which site must take over on failure of a site and how requests that were handled by the failed site must be redistributed across the remaining sites.

    • Consistent view of state on failover: Once the new site has been selected for failover, HAL, through its interaction with MUSIC, ensures that the ONAP component has the up-to-date view (if required) of the state at that site.


Scope:

CHAP is a common platform that each ONAP component can use for  cross-site redundancy and failover. While its intended granularity is at the ONAP components level, it will work well even for micro-services. 

Out of scope:

  • While CHAP has the necessary technology to enable state replication and failover within a site, our intended scope for CHAP is strictly for cross-site state replication and failover. We assume that each ONAP component will take care of redundancy within a site. 
  • While CHAP has the necessary technology to perform failure detection of sites/onap components, in alignment with the larger ONAP architecture, we consider that the responsibility of OOM.  
  • CHAP is not intended to failover the entire ONAP eco-system as a whole and this too is the responsibility of OOM. CHAP's intended use is to enable the individual components failover in a correct, efficient manner across sites. 

Usage:

  • MUSIC will be maintained as a shared service exporting REST end-points that can be used across ONAP components. ONAP components can directly use the REST API provided by MUSIC  (like ONAP HAS does now) for state-management. 

  • If an ONAP component maintains state in a SQL database (and want to continue doing so), then they simply need to replace their existing jdbc driver with mdbc driver. 
  • The HAL daemon

...

  • is expected to run as a companion daemon

...

  • to the ONAP component at each site, ideally in the same failure domain the ONAP component (i.e, they fail together). Hence, it should be a process in the same VM or container of the ONAP component. While HAL currently exports a REST API we are working on a java library that components can simply link to their code.
  • ONAP components need to provide HAL with (1) failover scripts that HAL will call when an ONAP component on a site is taking over requests corresponding to the failed component. This could involve actions such as updating MSB or DMaap that are very component-specific. (2) site-selection/request-redistribution scripts that HAL will use to decide which site (and the ONAP component) must take-over which request on failure. For example a simple policy could be: "the least busy site must take over all requests that were assigned to the failed ONAP component". 


Architecture

...

:

The figures below (The CHAP components are highlighted in green) describe how CHAP can be used in a general context and also provide a  specific example of its potential usage in ONAP MSO:

  • The usage pattern echoes a common resiliency pattern in ONAP wherein each component is replicated within the site and across sites.

  • For geo-redundant state persistence the ONAP component can directly use the REST API provided by MUSIC in CHAP (like ONAP HAS does now). However, if it currently uses a database for state persistence, then it can use the mdbc driver which will intercept its local calls to its database and copy it to MUSIC. This is shown in the second figure for how MSO, which maintains state in a H2 database may use it.

  • Each ONAP component replicas runs a companion HAL daemon (halD) that provides health checks, failure detection and failover as mentioned before.

General Usage:




Image Modified


A specific example:

...

  • All three components of CHAP have been open sourced: MUSIC, mdbc, HAL.

  • OOF-Homing Optimizer (HAS) uses CHAP for its state persistence and as a highly available distributed messaging service. This is currently being run in production within ATT


...

    • How does this project fit into the rest of the ONAP Architecture?

    CHAP will be available as a common service like DMaap or AAF as shown in the red, oblong box below:

    • What other ONAP projects does this project depend on?

    Since OOM is responsible for the life-cycle of ONAP components, it will also need to manage the deployment of CHAP. Further CHAP (HAL in particular) needs to interact with OOM to perform failover when OOM detects a site-failure. 


    • How does this align with external standards/specifications?

    Among the components of CHAP, MUSIC and HAL export a REST API while mdbc is to be used as a jdbc driver.


    • Are there dependencies with other open source projects?

    CHAP depends primarily on Apache Cassandra and Zookeeper.


...