Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DMaaP data is read and processed by many ONAP components. DMaaP is backed by Kafka, which is a system for Publish-Subscribe, and is not suitable for data query and data analytics. Additionally, data in Kafka is not meant to be a permanent storage permanent and data gets deleted after certain retention period. Thus it is useful to persist the data that flows through DMaaP to databases, with the following benefits:

...

In this project, we provide a systematic way to real-time ingest DMaaP data to permanent storage.storage and provide analytics tools and applications built on the data

DataLake's goals are:

  1. Provide a systematic way to real-time ingest DMaaP data to Couchbase, a distributed document-oriented database, and Druid, a data store designed for low-latency OLAP analytics.
  2. Serve as a common data storage document storage for other ONAP components as well, with easy access.
  3. Provide data-access APIs and ways for ONAP components and external systems (e.g. OSS/BSS) to consume the data.
  4. Provide sophisticated and ready-to-use data interactive analytics GUI tools that are built on the data.

Architecture

The data storage and associated tools are external infrastructures to ONAP, to be installed only once initially, or making use of existing infrastructures. Since costume setting and applications will be deployed to them, they are really integrated part of DataLake. 

Scope

Data Sources

  • Monitor all or selected Data topics, real-time reads read the data, and persists persist it.

  • Other ONAP components can use DataLake as a storage to save application specific data, through DMaaP or DataLake REST APIs.

  • Other data sources will be supported if needed.

...

  • Monitor selected topics, real-time pull the data and insert it into Druid, one datasource for each topic, with the same datasource name as the topic name.

  • Extracts the dimensions and metrics from JSON files, and pre-configure Druid settings for each datasource, which is customizable through a web interface.

  • Integrate Apache Superset for data exploration and visualization, and provide pre-builds interactive dashboards.

  • Integrate Grafana for time series analytics.

Other Stores

  • Based on future requirements, other storage may be supported. For example, if in the future we need to support unstructured data, we will consider including search engine technologies like Elastic Stack (ELK) for it.

Architecture Alignment

  • How does this project fit into the rest of the ONAP Architecture?
    DataLake provides both API and UI interfaces. UI is for analyst to analysis the data, while API is for other ONAP (and external) components to query the data. For example, UUI can use the API to retrieve historical events. Some of DCAE service applications may also make use of the APIs.
    • What other ONAP projects does this project depend on?
      DataLake depends on DMaaP for data ingestion, also depends on some other common services: OOM, SDC, MSB.

  • In Relation to Other ONAP Components
    • DCAE focuses on being a part of automated closed control loop on VNFs, storing collected data for archiving has not been covered by DCAE scope. (see ONAP wiki forum). Envision that some DCAE analytics applications may use the data in DataLake.
    • PNDA is an infrastructure that bundles a wide variety of big data technologies for data processing. Applications are to be developed on the technologies provided by PNDA. The goal of DataLake is to store DMaaP and other data, and build ready-to-use applications around the data, making use of suitable technologies, whether they are provided by PNDA. Currently Couchbase, Druid and Superset are not included in PNDA.
    • Logging project‘s data source is logs, which are unstructured, and content is at the mercy of developer who usually output only portion of the information. Besides, applications need to follow ONAP Application Logging Specification v1.2 (Casablanca).  On the other hand, DataLake's data source is from DMaaP. Since the data from DMaaP is meant to be consumed by ONAP components, as far as I know, they are all structured. Since DataLake is unintrusive, other components do not need to make any change to harvest the benefit from DataLake.
    • ONAPARC-233 , "Platform data management layer (consolidate multiple DBs)", sounds like having some overlap with DataLake's scope. But there is no detail in that proposal. Will comment when more details are revealed.
  • How does this align with external standards/specifications?
    • APIs/Interfaces  - REST, JSON, XML, YAML
    • Information/data models - Swagger JSON
  • Are there dependencies with other open source projects?
    • Couchbase
    • Apache Druid
    • Apache Superset
    • Grafana
    • Apache Spark
    All use Apache 2.0 License.

...

  • link to seed code (if applicable)
  • Vendor Neutral
    • Yes
  • Meets Board policy (including IPR)
  • JIRA ticket is created for 2018-10-29 Arc dublin F2F.

Use the above information to create a key project facts section on your project page

...