Edge Scoping MVP for Casablanca - ONAP Enhancements

Summary: Edge Scoping

Distributed Cloud Infrastructure Object Hierarchy, Cloud-agnostic Placement Policy & Homing Policies

Value:

5G VNF Placement
Improve ONAP Deployability through Cloud-Agnostic Intent for Intra-DC Placement

References: ONAP R3+ Cloud Infrastructure Modeling; Cloud Infrastructure Aggregate Representation Classes

ONAP Component	Life Cycle Phase	Enhancements
Policy	Design	Define Distributed Cloud Infrastructure Placement Policies (Note 3) – No enhancement needed to Policy Framework Leverage Standardized Distributed Cloud Infrastructure Object Hierarchy & Capabilities from A&AI
Multi-Cloud	Deploy	Support Distributed Cloud Infrastructure Capability Discovery (Note 1, Note 2) SO ↔ MC IaaS intent-based workload instantiation API to support cloud agnostic intent for Compute/Network/Storage; MC to translate cloud-agnostic intent from SO into cloud-specific placement attributes (Note 7)
A&AI	Deploy	Support Standardized Distributed Cloud Infrastructure Object Hierarchy & Capability Database Loose coupling between HW objects (private cloud) and SW objects (private and public clouds) Includes Standardized Capabilities across clouds & Capabilities unique to certain clouds Note: Multi-Cloud Distributed Cloud Infrastructure Capability Discovery process will populate the aforementioned database
OOF	Deploy	Execute Distributed Cloud Infrastructure Placement Policies for Optimized Service/VNF Placement across Cloud Regions (Note 4)
SO	Deploy	Extend SO ↔ OOF API to support cloud agnostic intent (Note 5)
TBD (Multi-Cloud or OOF)	Deploy	Placement Service per Cloud Region Capacity Check Resource Reservation

Assumption for Policy, SO, OOF:

This uses the current Generic VNF workflow in SO

Note 1:

Configured Capacity and Utilized (or Currently Used) Capacity are managed by the specific cloud.

Note 2:

Cloud SW Capability example
- Cloud region "x" with SR-IOV, GPU, Min-guarantee support
- Cloud region "y" with SR-IOV support
Cloud HW Capability example
- Resource cluster "xa" in Cloud region "x" with SR-IOV and GPU support
- Resource cluster "xb" in Cloud region "x" with GPU support
- Resource cluster "ya" in Cloud region "y" with SR-IOV support

Note 3:

5G Service/VNF placement example
- Constraints used by Optimization Framework (OOF)
  - 5G CU-UP VNF location to be fixed to a specific physical DC based on 5G DU, bounded by a max distance from 5G DU
- Optimization Policy used by OOF
  - Choose optimized cloud region (or instance) for the placement of 5G CU UP for subscriber group based on the above constraints

Note 4:

For the 5G Service/VNF placement example in Note 3
- 5G CU-UP VNF preferably maps to a specific Cloud region & Physical DC End Point

Note 5:

For the 5G Service/VNF placement example in Note 3
- OOF will pass the Physical DC End Point to SO as a cloud agnostic intent

Note 6:

For the 5G Service/VNF placement example in Note 3
- SO passes the Physical DC End Point to Multi-Cloud as a cloud agnostic intent, besides the Cloud Region

Note 7:

Cloud agnostic placement attributes are targeted to abstract the following cloud specific placement attributes
- HPA attributes (e.g. Smart NIC Family, GPU Family) based on Cloud specific HW/SW support
- Normalized CPU capacity for VMs/Containers based on Cloud specific HW support
  - Reference for CPU Normalization: https://d1.awsstatic.com/whitepapers/Demystifying_vCPUs.df200b766578b75009ad8d15c72e493d6408c68a.pdf
- Fine-grained Placement attributes based on Cloud specific SW support
  - e.g., Rack-level Anti-affinity-> Azure: Fault-Domain, AWS: Placement-Group
  - e.g., Exclusivity -> Azure: Isolated VM, AWS: Dedicated Host
  - e.g., Fine grained QoS -> VMware Minimum guarantee, Kubernetes Burstable Class
Cloud Agnostic Intent-based API – Example 1
- Intent: Rack-level Anti-affinity for VMs within a VNF
- Different Realization for different clouds
  - OpenStack-based (open source, Wind River, VMware VIO)
    - Heat Template – policy-group, anti-affinity policy
  - Azure
    - Fault-domain
  - AWS
    - Placement group
For the 5G Service/VNF placement example in Note 3
- Multi-Cloud interprets cloud agnostic intent as Physical DC Endpoint and translates to cloud-specific placement attribute such as Availability Zone
  - For this example, each distributed physical DC is in a separate Availability Zone for a OpenStack-based Cloud

Intra-DC networking with Intent-based Cloud Agnostic API (SO → MC → Cloud Region)

References: https://wiki.onap.org/download/attachments/28379482/ONAP-mc-sdn-v2.pptx?api=v2; https://wiki.onap.org/download/attachments/11928197/ONAP-mc-sdn.pdf?version=1&modificationDate=1506518708000&api=v2

Value:

5G High Performance Networking
Improve ONAP Deployability through Cloud-Agnostic Intent for Intra-DC network interconnect

ONAP Component

Life Cycle Phase

Enhancements

Policy

Design

Define Distributed Cloud Infrastructure Network Intent Per Cloud Region (Note 1)

Leverage Standardized Distributed Cloud Infrastructure Object Hierarchy & Capabilities from A&AI

Multi-Cloud (MC)

Deploy

MC <-> Private/Public SDN-DC – per cloud region interaction

Translate Cloud Agnostic intent to Cloud specific intent or imperative APIs by querying Policy
Return network endpoint (used for VNF placement)

SO

Deploy

(SO -> MC -> Private/Public SDN-DC) -- desired

SO <-> MC IaaS Intent-based intra-dc network instantiation API (Besides Network, also support Compute/Storage as described in the previous section)

Note 1:

Cloud Agnostic Intent Execution
- Intent: High Performance Intra-DC data plane networking
- Several Realization Options per Cloud – the final realization choice could be based on least current resource usage or other criteria
  - Overlay in SmartNIC
  - Gateway in SmartNIC
  - Overlay in DPDK-based switch/router
  - Gateway in DPDK-based switch/router
  - Overlay in ToR
  - Gateway in a ToR
  - Gateway in a HW appliance
- Realizations which are fixed
  - Underlay maps to ToR/Network Fabric
  - No CPU usage for data plane networking maps to VMs/Containers with SR-IOV support
- 1) OOF → MC runtime check for a specific cloud region which can support capability/capacity/cost metrics (enhance current capacity check API appropriately)
- 2) MC processes request from OOF
  - Retrieve target Cloud Region specific policy from "Policy"
  - Evaluate each of the cloud specific options in the policy from a perspective of resource allocation, utilization and cost
  - Return to OOF the option which minimizes <resource allocation, utilization and cost> and the net value of the option
    - In case there requirement cannot be met, return appropriate error
  - Cache the option in MC for the specific cloud region as part of the VNF deployment workflow
- 3) OOF processes request from MC and picks the target cloud region which maximizes the net value
- 4) SO → MC VNF deployment request is processed by MC
  - MC looks up the cache for the target cloud request with the VNF deployment request details
  - MC replaces appropriate details in the cloud specific template (Heat, ARM etc.) based on the chosen option in the cache
  - MC deploys workload on target cloud region with dynamically modified cloud specific template
  - MC removes the cache entry for the specific cloud region for the specific VNF deployment

Cloud Resource Partitioning for Differentiated QoS

Value:

Applicable to all use cases
Casablanca Targets:
- vCPE (Enable Tiered service offering); 5G Network Slicing (Stretch Goal)

Edge Automation Requirement:

Support three types of slices in the Cloud Infrastructure (Definition Reference: https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/)

Guaranteed Resource Slice (hard isolation) for various infra Resources (CPU/Memory/Network)
- Max (limit), Min (request) are the same; resource guarantee is "Max"
- Maps to 5G Applications such as Connected Car which fall in the category of ultra-reliable machine-type communications (ref. 1)
Burstable Resource Slice (soft isolation) for various infra Resources
- Min (request) <= Max (limit); resource guarantee is "Min"
- Maps to Burstable Network Slice such > 1Gbps broadband which fall in the category of extreme mobile broadband (ref. 1)
Best Effort Resource Slice (no isolation) for various infra Resources
- No Min (request) ; resource guarantee is "None"
- Maps to 5G Applications such as IoT which fall in the category of massive machine-type communications (ref. 1)

Implementation:

Leverage current HPA framework with appropriate extensions

References:

https://metis-ii.5g-ppp.eu/wp-content/uploads/white_papers/5G-RAN-Architecture-and-Functional-Design.pdf
Driving Superior Isolation for Tiered Services using Resource Reservation -- Optimization Policies for Residential vCPE
-https://jira.onap.org/browse/OPTFRA-240

Note:

Any VMs/Containers which are part of a resource slice will adhere to the specs of the resource slice

ONAP Component	Life Cycle Phase	Enhancements
Policy	Design	Configuration Policies for Guaranteed, Burstable & Best Effort Cloud Infrastructure Resource Slices (this will apply to VMs/Containers also) Placement Policies for Resource Slices Higher (programmable) weight to Cloud Region which supports all three types of resource slices vs only two types of resource slices (Guaranteed/Best Effort)
Multi-Cloud	Deploy	Resource Slice Capability Discovery
A&AI	Deploy	Resource Slice Capability per Cloud Region Guaranteed/Burstable/Best Effort Resource Slice Type Guaranteed/Burstable/Best Effort
OOF	Deploy	Execute Resource Slice Placement Policies for Optimized Service/VNF Placement across Cloud Regions

Aggregated Infrastructure Telemetry Streams

Value

Edge Infrastructure Analytics complementing 5G VNF Analytics

ONAP, as in R2, collects the statistics/alarms/events from workloads (VMs) and take any close loop control actions such as Heal a process, scale-out, restart etc.. In R3, infrastructure related statistics/alarms/events will be collected, generate actionable insights and take life cycle actions on the workloads. Infrastructure statistics normally include performance counters, NIC counters, IPMI information on per physical server node basis. To reduce the load on the ONAP, it is necessary that aggregated (summarized) information is sent to the ONAP from edge-clouds.

As part of this activity, intention is to create aggregation micro-service that collects the data from physical nodes (over collected and other mechanisms), aggregate the information (time based aggregation, threshold based aggregation, silencing etc.,..) based on the configurable rules and export the aggregate data to DCAE. This micro service can be instantiated by ONAP itself - one or more instances for edge-clouds at the ONAP-central itself using OOM, it could be instantiated at the edge-cloud using their own deployment tools or it could be deployed edge service providers at the regional site level.

Impacted projects (development activities)

ONAP Component	Enhancements
Overall	Define models to represent summation information (Alerts/statistics/Events) for various groups Defining various groups such as CPU usage, Memory usage, file descriptor usage, NIC utilization, various HPA features etc...
Multi-Cloud	Development activities: Prometheus based monitoring & summation Support for collectd for statistics collection from NFVI nodes. Support for VES agent to send the aggregate data to DCAE (Used when the aggregate service is instantiated outside of ONAP control) Support for DMAAP agent to send the aggregate data to DCAE (Normally used if the aggregate service is instantiated at the ONAP-Central. Provide ability to add new plugins (to collect statistics as well as to export aggregation information) Provide ability to upload the recording and alert rules (on per edge-cloud basis or set of edge-clouds basis) Ability to auto-cleanup of time series DB (based on size allocated for this micro-service) Edge-Cloud registration time (as part of ESR) Check whether registration data indicates whether the aggregation service to be brought up). If so, inform the aggregation micro service to authentication and listen for statistics from that edge-cloud. Run time Collects the information (support for both pull/push). Apply rules Generate alarms Export them via VES or DMAPP or any other plugins in future.
AAI & ESR	Development activities Enhancements to ESR to indicate whether aggregation service is required for this edge-cloud at the ONAP. Enhancements to ESR to indicate Multi-Cloud for Multi-Cloud to listen for connections and statistics requests from the edge-clouds. Information such as CA cert to use to authenticate the remote party or any other UN/PWD method.
PORTAL	ESR portal related changes to take information about the edge-cloud (CA Cert and UN/PWD information)
DCAE & DMAPP	None expected??

Life Cycle stages related functions

ONAP Component	Life cycle phase	Activities
AAI and ESR	Deploy & Run time	Add/Modify/Delete recording and alerting rules
AAI and ESR	Run time	Add/Modify/Delete Edge-cloud information
Multi-Cloud	Run time	Get Edge information from A&AI whenever Edge-Cloud is added or removed. Prepare to wait for information from that Edge-cloud Receive information from edge-cloud and put it in the time series DB. Summation based on recording & alerting rules Export information to DCAE via DMAPP or VES

ONAP Edge Analytics with DCAE/DMaaP independent of closed loop

Value

5G Analytics

ONAP Component	Life cycle phase	Enhancements
OOM - ONAP Central	Deploy	Separate ONAP-edge Instance per 'edge domain', (ie., separate from onap-central instance, of course) Note: Independent of any Edge CP's Orchestration components. SP uses a central-OOM with a 'policy' for deployment of an onap-edge instance, e.g., xyz edge provider with abc components, etc. However, onap-edge instance can be 'lighter weight' with subset of components needed (per MVP discussed below) Desirable to managed as a separate K8s cluster (ie., separate from onap-central instance, of course) and, only for onap-edge use, ie., don't use for other 'workloads' like network apps or 3rd party apps Central OOM to deploy the following ONAP edge instance DMaaP with mirror capability

Space shortcuts

Page tree

Summary: Edge Scoping

Distributed Cloud Infrastructure Object Hierarchy, Cloud-agnostic Placement Policy & Homing Policies

Intra-DC networking with Intent-based Cloud Agnostic API (SO → MC → Cloud Region)

Cloud Resource Partitioning for Differentiated QoS

Edge Automation Requirement:

Aggregated Infrastructure Telemetry Streams

ONAP Edge Analytics with DCAE/DMaaP independent of closed loop