Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Permanently persist the data that flows through ONAP, and provide ready-to-use data analytics applications built on the data.

Background

There are huge amounts large amount of data flowing among ONAP components, mostly via DMaaP and Web Services. For example, all field events collected by DCAE collectors go through DMaaP. DMaaP is backed by Kafka, which is a system for Publish-Subscribe, where data is not meant to be permanent and gets deleted after certain retention period. Though some components may store processed result into their local databases, most of the raw data will eventually lost. We should store the these data, which will could provide insight to the network operation, by way of Big Data with help aata analytics and machine learning technologies. In this project, we start by persisting all the raw data though DMaaP, 

Project Description

In this project, we try towill:

  1. Provide a systematic way to real-time ingest DMaaP data to a few selected Big Data storage systems, such as, but not limit to, Couchbase, a distributed document-oriented database, Druid, a data store designed for low-latency OLAP analytics, and HBase, a Hadoop database for mass batch processing. What data goes to which databases is configurable, depending on what problems we try to solve, and the results we want to achieve. For example, storing data in Druid, a OLAP storage, we can integrate it with OLAP tools like Superset, and time series tools like Grafana. In the future, new requirements may require we support supporting additional storage systems.
  2. Provide sophisticated and ready-to-use interactive analytics tools that are built on the data. These tools fall into two categories: integrated third party data analytics tools, such as Superset and Grafana, and custom applications developed by us. Custom applications includes ETL applications, Big Data analytics programs developed in Spark framework, and Machine Learning models. While integrated third party tools are mostly for system operators (human beings) with GUI interfaces, custom applications' results are consumed by both system operators and programs like ONAP components and external systems (e.g. OSS/BSS). 

Architecture

Image RemovedImage Added

The data storage and associated tools are external infrastructures to ONAP, to be installed only once initially, or making use of existing infrastructures. Since costume setting and applications will be deployed to and run on them, they are really integrated parts of DataLake. 

...