You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This is a potential draft of a project proposal template.  It is not final or to be used until the TSC approves it.

Link to Project Proposal training materials

Project Name:

  • Proposed name for the project: DataLake
  • Proposed name for the repository: datalake

Project description:

DMaaP data is read and processed by varieties of ONAP components. DMaaP is backed by Kafka, which is a system for Publish-Subscribe, and is not suitable for data query and data analytics. It is useful to persist many of the data that flow through DMaaP in a database, for the following reasons:

  1. Data is stored in a permanent storage for history record. DMaaP is free to set its message retention time without taking history record as a concern.

  2. With database table’s schema, it is convenient to query and retrieve data.

  3. For data analytics and report, accessing data from a database is easier than from DMaaP/Kafka.

In this project, we provide a systematic way to real-time ingest DMaaP data to MongoDB, a  document-oriented NoSQL database with flexible table schema, and Druid, a data store designed for real-time OLAP analytics .

Scope:

  • Provide admin REST API for configuration and topic management. A topic can be configured to be exported to which data stores, with MongoDB and Druid supported initially. We may support more noSQL databases in the future.

  • Provide SDC/Design time framework UI for management, making use of the above admin REST API.

MongoDB:

  • Monitor selected topics, real-time pull the data and insert it into MongoDB, one table for each topic, with the same table name as the topic name.

  • Data types JSON, XML, and YAML are auto detected, and are stored in native MongoDB schema. Data not in these formats is stored as a single string (for now). We may support additional formats.

  • Provide REST API for data query, while applications can access the data through native MongoDB’s API as well.

Druid:

  • Monitor selected topics, real-time pull the data and insert it into Druid, one datasource for each topic, with the same datasource name as the topic name.

  • Provide basic schema for each datasource, which is customizable through a web interface.

  • Integrate Apache Superset for data exploration and visualization.

Architecture Alignment:

  • How does this project fit into the rest of the ONAP Architecture?
    DataLake provides both API and UI interfaces. UI is for analyst to analysis the data, while API is for other ONAP (and external) components to query the data. For example, UUI can use the API to retrieve historical events. Some of DCAE service applications may also make use of the APIs.
    • What other ONAP projects does this project depend on?
      DataLake depends on DMaaP for data ingestion, also depends on some other common services: OOM, SDC, MSB.

  • How does this align with external standards/specifications?
    • APIs/Interfaces  - REST, JSON, XML, YAML
    • Information/data models - Swagger JSON
  • Are there dependencies with other open source projects?
    • MongoDB
    • Druid
    • Apache Superset

Other Information:

  • link to seed code (if applicable)
  • Vendor Neutral
    • Yes
  • Meets Board policy (including IPR)

Use the above information to create a key project facts section on your project page

Key Project Facts:

Facts

Info

PTL (first and last name)Guobiao Mo
Jira Project NameDataLake
Jira KeyDATALAKE
Project IDdatalake
Link to Wiki Space

Release Components Name:

Note: refer to existing project for details on how to fill out this table

Components Name

Components Repository name

Maven Group ID

Components Description

datalakedatalakeorg.onap.datalakeData stores for DMaaP data, with data access API and GUI data analysis tools.




Resources committed to the Release:

Note 1: No more than 5 committers per project. Balance the committers list and avoid members representing only one company. Ensure there is at least 3 companies supporting your proposal.

Note 2: It is critical to complete all the information requested, that will help to fast forward the onboarding process.

Role

First Name Last Name

Linux Foundation ID

Email Address

Location

PTLGuobiao Moguobiaomoguobiaomo@chinamobile.comMilpitas, CA USA. UTC -7
CommittersGuobiao Moguobiaomoguobiaomo@chinamobile.comMilpitas, CA USA. UTC -7















Contributors













  • No labels