The goal of this DCAE project is to provide the PNDA platform as a deployment option that delivers a big-data analytics platform as part of DCAE.
Meetings: PNDA-DCAE integration is discussed as part of the weekly DCAE call (Thr UTC 13:00 / China 21:00 / Eastern 9:00 / Pacific 06:00 zoom.us/j/824147956) check on DCAE Weekly Meetings
Overview
Overview presentation of DCAE-PNDA-Overview.pdf.
High level summary of tasks:
Installation of PNDA within DCAE:
|
Health-Check (PNDA to integrate with DCAE health check):
|
Enable Application Deployment on PNDA via DCAE:
|
Data Integration (Enable PNDA to receive data from DCAE collectors like VES, etc.)
|
Release related information
Casablanca M3 Milestone for PNDA integration into DCAE:
- Support for HDFS API,
- VES data available in HDFS
- Support for Spark Streaming API
- Support for Spark Batch API
- Jupyter Notebook
PNDA 5.0 Components versions
The source of truth regarding versions is available in the PNDA 5.0 release note.
Component | Version | |
---|---|---|
Kafka | 1.1.0 | |
Kafka Manager | 1.3.3.15 | |
PNDA Deployment Manager | XXX | |
PNDA Package Repository | XXX | |
PNDA Console | XXX | |
Gobblin | 0.11.0 | |
Flink | 1.4.2 | |
Knox | 1.1.0 | |
HortonWorks | 2.6.5 | |
Hadoop | 2.7.3 | |
HBase | 1.1.2 | |
Hive | 2.1.0 | |
Spark | 1.6.3 | |
Spark | 2.3.0 | |
Oozie | 4.2.0 | |
Grafana | 5.1.3 | |
OpenTSDB | 2.3.0 | |
Consul | 1.0.3 | |
Jupyter | 4.2.1 |
pnda API's
As part of the ongoing dcae integration with the pnda data platform, here are some pointers defining the provided pnda API’s:
- Platform Data Management: https://github.com/pndaproject/platform-data-mgmnt/blob/develop/data-service/README.md
- Platform Deployment Manager https://github.com/pndaproject/platform-deployment-manager#api-documentation
- Platform Package Repository https://github.com/pndaproject/platform-package-repository#repository-api
List of JIRA tickets associated with PNDA for DCAE - Casablanca
List of JIRA tickets associated with PNDA for DCAE - Backlog
18 Comments
Donald Hunter
Guidance from OOM team:
Hi Donald,
Thanks for reaching out to the OOM team.
There is some guidance I can provide to expedite the code review/merge process for your new helm charts. A lot of time has been spent towards standardizing helm charts. Although not perfect, and still evolving, we strive for a level of consistency in the templates.
Creating Helm Charts
There is a “starter” helm chart that can be found here: onap-chart. This is not a one-size fits all, but provides a basis for most charts. The importance is in the values.yaml. An attempt at standardizing configuration parameter names (based on Helm best practices), that allows for centralized hierarchical configuration. I would try to pour your specific configuration into this example. I would have recommended to clone-and-own the dcae-bootstrap helm chart instead but I see it has eliminated some of the standardized config.
Enable/disable PNDA Deployment
There isn’t really a requirement to have every chart disabled by default and then opt-in. In fact, by default, all ONAP components are deployed out-of-the-box. It can be viewed as a “demo” deployment. Customized deployments can use an override file to disable components as necessary. For PNDA, however, it would make sense to either disable the PNDA bootstrap sub chart or have the sub chart deploy but not spin up PNDA VMs unless a configuration flag is enabled and/or openstack configuration is provided. The configuration flag can be added to the DCAE values.yaml.
Today, each ONAP project (ie. DCAE, SO) can be enabled/disabled via the values.yaml (+ requirements.yaml) inside the onap parent Helm chart. Unfortunately, this does not provide control over individual subcharts like the PNDA bootstrap sub chart you’re adding to DCAE.
OpenStack Configuration
There are a few projects that need OpenStack configuration. Take a look inside ONAP values.yaml for APPC, NBI and SO configuration. I would use these as examples on how to propagate the configuration you need. There is an effort to consolidate this configuration into a single shared configuration but that is a future deliverable.
Feel free to put up draft patches as soon as possible to get early feedback from the OOM team.
Please let me know if you have any questions, comments or concerns.
Thanks,
Mike.
--
Mike Elliott
ONAP OOM PTL
Senior Architect - Amdocs
Roger Maitland
Hi Donald Hunter, when are you planning to introduce PNDA into DCAE? By the state of the stories it looks like this is Dublin content - correct?
Thanks,
Roger
Donald Hunter
Hi Roger Maitland,
Actually we are planning to integrate several of these stories into Casablanca. #DCAEGEN2-367 is merged but the dcaegen2/deployments job is failing because it needs a larger VM flavour and we have an open helpdesk ticket for that. The stories which touch OOM are queued, waiting for the container artifacts from dcaegen2/deployments.
Cheers,
Donald.
Srinivasa Addepalli
Hi,
As I understand, there is a deployment manager in PNDA that is used to upload packages (Analytics applications), create applications from packages and start applications. Few questions on integration with rest of ONAP:
Srini
Adding notifications...
Vijay Venkatesh Kumar, Frank Brockners and Donald Hunter
Donald Hunter
Hi Srini,
Your understanding is correct, there is a deployment manager in PNDA that has REST APIs for package upload, application creation and control.
There are open issues relating to deployment integration for ONAP that we have still plan and see if we can scope them for Dublin.
Cheers,
Donald.
Srinivasa Addepalli
Thanks Donald. I guess I assumed that the integration aspects are worked out. As part of edge-automation, we are trying to see how we can bring up networking analytics apps in remote spark clusters (deployed at the edge or regional sites). Hence, this integration aspect is very important for us.
Srinivasa Addepalli
Vijay Venkatesh Kumar, Frank Brockners, Donald Hunter, ramki krishnan, Raghu Ranganathan
Hi DCAE team,
We at the edge automation group are studying to see how Analytics applications can be run in Edges or near to the edges. There are three aspects
All of above need to happen from ONAP-Central.
When edge site is onboarded in ONAP, ONAP. optionally, brings up analytics platform (Spark platform) in edge sites or delegated sites. We also don't want to rule out bringing up spark platform by other means in edge or delegated locations.
When new analytics application is onboarded in ONAP, based on edge configuration, these application images would need to be sent to the edges. We also don't want to rule out uploading analytics applications using other mechanisms.
When ONAP (on what basis is TBD) decides to submit jobs (streaming and batch), ONAP need to communicate with edge platform to submit spark jobs.
There is one Apache Livy project (https://livy.incubator.apache.org/), which provides server software and clients in different languages. It seems that using this ONAP (by integrating client) can talk to various edge analytics platform using RESTful API to submit and query jobs. Thought process is to leverage server portion in analytics platform.
That said, We want to leverage as much work that was done either in DCAE and PNDA. Hence looking for suggestions.
Also looking for suggestions on workflow orchestration of spark pipeline (Use Oozie or Apache Airflow).
Srini
Srinivasa Addepalli
Frank Brockners and Donald Hunter,
In 'Creating PNDA' section of PNDA guide, it talked about bringing up standard version of PNDA on various cloud technologies such as Openstack, AWS, bare-metal servers. In case of Openstack, I see set of HOTs - one for each component or dependency component https://github.com/pndaproject/pnda-cli/tree/develop/heat-templates/standard.
Since ONAP is using K8S, I guess there would be Helm charts for each component. Also, as part of building, we would assume that there would be Dockerfile for each one of the components.
I tried to search in OOM for PNDA related helm charts. I only found two of them, even those are not related to actual components https://gerrit.onap.org/r/gitweb?p=oom.git;a=tree;f=kubernetes/pnda/charts;h=adc873ef88314d9addc7581cfc951eea1f80715c;hb=HEAD
Can you point me to the right place on where DockerFiles and Helm charts present in github/ONAP-gerrit for standard PNDA components?
Srinivasa Addepalli
Hi,
On Deployment Manager and its integration with K8S, we are hoping that spark-k8s-operator can be used.
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
Let us know whether this or somethig similar in your plans for Kubernetes. If it is not in your radar, we may be able to help. Please let us know.
Brian Freeman
I see charts in casablanca OOM for PNDA but they are not enabled - is PNDA a Dublin feature then given where we are in the release cycle ?
Donald Hunter
Hi Brian,
The PNDA charts are not enabled by default because they are only supported when the Kubernetes cluster is on Openstack infra. You need to provide Openstack API parameters to the helm install so that the PNDA bootstrap container can provision Openstack VMs for the PNDA cluster,
Cheers,
Donald.
Brian Freeman
So PNDA is not docker container based ?
Donald Hunter
No, it is not. The Hadoop ecosystem has traditionally been bare-metal based. Many Hadoop components can be containerised but the distros are not quite there yet.
Srinivasa Addepalli
I see following helm charts in Helm repository
HDFS: https://github.com/helm/charts/tree/master/stable/hadoop
Spark: https://github.com/helm/charts/tree/master/stable/spark
Kafka: https://github.com/helm/charts/tree/master/incubator/kafka
As part of analytics-as-a-service initiative (in R4), thought is to make everything Helm based deployment for packages that PNDA uses from open source and develop for others which are not yet found in open source (such as OpenTSDB, PNDA deployment manager). Our main intention is to bring up analytics framework not only in ONAP (using OOM), but also bring up framework anywhere (Edge, Regional sites etc.., using site specific K8S.) for doing network analytics.
Of course, we need to work with you to ensure that there is no duplicate effort. Please do let us know what you are planning for R4.
Srini
Donald Hunter
Note that the hadoop chart you linked says this:
"This chart is primarily intended to be used for YARN and MapReduce job execution where HDFS is just used as a means to transport small artifacts within the framework and not for a distributed filesystem. Data should be read from cloud based datastores such as Google Cloud Storage, S3 or Swift."
Srinivasa Addepalli
Sorry. That was meant for Hadoop.
In case of K8S Spark, we only require HDFS. Charts are given here: https://github.com/apache-spark-on-k8s/kubernetes-HDFS
These are the ones, we think, can be used as base.
Let us know whether it satisfies PNDA.
Donald Hunter
Thanks for the link.
I will take a look at this as to see if we can put PNDA services on top.
Donald Hunter
Hi Srinivasa,
We are just working through what to plan in R4 and would definitely like to collaborate with you. I would like to eventually reach a fully containerised analytics-as-a-service solution as you describe, but I don't know if that is achievable in R4 timeframe.
We should be able to leverage existing dockerfiles and helm charts for some components like kafka, OpenTSDB, etc. The HDFS/Spark deployment and the data storage management is the harder part.