Summary: Edge Scoping 




Distributed Edge Cloud Infrastructure Object Hierarchy (Stretch Goal)

Value:

References: 



ONAP ComponentLife Cycle PhaseEnhancements
Multi-CloudDeploy

Support Distributed Cloud Infrastructure Capability Discovery (Note 1, Note 2)

A&AIDeploy

Support Standardized Distributed Cloud Infrastructure Object Hierarchy & Capability Database (Ref. 1)

  • Loose coupling between HW objects (private cloud) and SW objects (private and public clouds)
  • Includes Standardized Capabilities across clouds & Capabilities unique to certain clouds
  • Note:
    • Multi-Cloud Distributed Cloud Infrastructure Capability Discovery process will populate the aforementioned database
OOFDeploy

Execute Distributed Cloud Infrastructure Placement Policies for Optimized Service/VNF Placement across Cloud Regions (Note 3, Note 4)


SODeploy

Extend SO ↔ OOF API to support data opaque to SO (Note 5)

Extend SO ↔ MC API to support data opaque to SO (Note 6)

Assumption for Policy, SO, OOF:

Note 1: 

Note 2:

Note 3:

Note 4:

Note 5:

Note 6:

Cloud-agnostic Placement/Networking & Homing Policies (Phase 1 - Casablanca MVP, Phase 2 - Stretch Goal)

End-to-end use case Applicability:

Value:


Phase 1 Summary:

Phase 2 Summary (Build on Phase 1 Work):

References: 

(warning) The sequence diagram below expands "Multi-Cloud/VNFM Deploy Apps" in Edge Scoping Sequence Diagram

Cloud Agnostic Intent (Policy) Execution Workflow:


Follow up :

//Support the current simple capacity check API besides the intent-based framework for backward compatibility. 
//If a cloud region does not support the policy-based interface, it is given a high net value assuming the current capacity api (yes/no) 
//returns an yes. This ensures smooth migration to the new policy-based framework. 
{
"service": "cloudPolicy",
"policyName": "oofMulti-cloudCasablanca.cloudPolicy_vCPE_VNF",
"description": "Cloud Policy for vCPE VNF",
"templateVersion": "0.0.1",
"version": "oofMulti-cloudCasablanca",
"priority": "3",
"riskType": "test",
"riskLevel": "2",
"guard": "False",

"content": 
{
	"cloudOwner" : 
	{
		"owner": "All",//default is all, it can be a specific cloud owner such as Azure, VMware VIO, Wind River Titanium Cloud etc.
		"intent": 
		{
			"name": "Infrastructure High Availability (HA) for VNF", 
				//realization thru OpenStack-based: anti-affinity, Azure: Fault Domain or
    			//Different anti-affinity models from ETSI -- host-level, rack-level, availability zone level
				//max-count in heat template - scale out factor
				//server-group in heat template - usable thru API and CLI in OpenStack, VMware VIO
		}
		"intent": 
		{
			"name": "Infrastructure Resource Isolation for VNF", 
				// realization possible without dedicating CPU and Memory, refer to section on "Cloud Resource Partitioning for Differentiated QoS" 
				// on how this can help in offering tiered services
			"qosProperty": 
			{
				{"Burstable QoS": "TRUE", "Burstable QoS Oversubscription Percentage": "25"}
				//{"Burstable QoS": "TRUE", "Burstable QoS Oversubscription Percentage": "25"}, {"operator", "OR"}, {"Guaranteed QoS": "TRUE"}
				// VMware VIO - tenant VDC CLI and API - configure the appropriate settings per tenant
				// Burstable QoS is specified through min guarantee (part of flavor metadata -- 
              	// https://docs.openstack.org/horizon/latest/admin/manage-flavors.html
			}
		}
		"cloudCapacityUtilizaitonAttributes" : 
		{
			//current_allocated_capacity is normalized to 1
			//max value for cpu or memory is 1 if usage is greater than equal to limit
			"current_allocated_capacity" :
			{ 
				{"cpu", "memory", "disk"}: "tenant (OpenStack Project or Resource Slice)", 
			},

			// under discussion - elaborate capacity, utilization checks for various objects
			//"current_allocated_capacity" : 
			//{ 
			//	{"cpu", "memory", "network"}: 
			//	{"cloud": {"weight": "0.85", "threshold": "0.9"}, 
			//	{"tenant (resource slice)": {"weight": "0.85", "threshold": "0.9" }, 
			//	{"host aggregate (resource cluster)": {"weight": "0.85", "threshold": "0.9"},
			//},
			//"average_utilization" : { {"cpu", "memory", "network"}: {"cloud": {"weight": "0.13"}, "tenant (resource slice)": {"weight": "0.13"}, 
			//	"host aggregate (resource cluster)": {"weight": "0.13"} }, "time-window": "24", "unit": "hours" },
			//"peak_utilization" : { {"cpu", "memory", "network"}: {"cloud": {"weight": "0.02"}, "tenant (resource slice)": {"weight": "0.02"}, "host 
			//	aggregate (resource cluster)": {"weight": "0.02"} }, "time-window": "24", "unit": "hours" }
			
			//current_allocated_capacity, average_utilization and peak_utilization are normalized to 1
			
			//For a given object such as tenant_cpu, sum of weights across all attributes (current_allocated_capacity, average_utilization & 
			//peak_utilization) must be 1
			//E.g. net_value = cloud_cpu_current_allocated_capacity*0.85 + cloud_cpu_average_utilization*0.13 + cloud_cpu_peak_utilization*.02 + ...
			
			//For a given object such as cloud_cpu, if the current_allocated_capacity "threshold" exceeds the specified value, return "high net 
			//value" 
		}
	}

	//use cloud provider in – <cloud region id, cloud provider> – different cloud providers may need different capacities for the same VNF
	"cloudOwner" : 
	{
		"owner": "Azure",
		"capacityProperty": 
		{ 		 
			//under discussion – "capabilityProperty": {SR_IOV, ...} 
			//under discussion - host network bandwidth
		
			"controller": "multicloud",
			"request": //from R2
			"{\"vCPU\": {\"quantity\": {\"get_param\": \"REQUIRED_VCPU\"}, \"Memory\": {\"quantity\": {\"get_param\": \"REQUIRED_MEM\"}, 	
			\"unit\": 	\"GB\"}, \"Storage\": {\"quantity\": {\"get_param\": \"REQUIRED_DISK\"}, \"unit\": \"GB\"}}"
		}
		
		"owner": "OpenStack",
		"capacityProperty": 
		{ 		 
			//under discussion – "capabilityProperty": {SR_IOV, ...} 
			//under discussion - host network bandwidth
		
			"controller": "multicloud",
			"request": //from R2
			"{\"vCPU\": {\"quantity\": {\"get_param\": \"REQUIRED_VCPU\"}, \"Memory\": {\"quantity\": {\"get_param\": \"REQUIRED_MEM\"}, 	
			\"unit\": 	\"GB\"}, \"Storage\": {\"quantity\": {\"get_param\": \"REQUIRED_DISK\"}, \"unit\": \"GB\"}}"
		}
	}
}

"resources": ["vGMux"], //R2 support status – single VF module assumption per VNF 
"applicableResources": "any",
"identity": "distance-vGMux",
"policyScope": ["vCPE", "US", "INTERNATIONAL", "ip", "vGMux"],
"policyType": "AllPolicy"
}

Private Cloud Setup (e.g. OpenStack)

 VNFC to Instance Type Mapping

Operator Configuration – Multi-VIM/Cloud Plugin

The operator/service provider who uses ONAP will choose which VIMs to use and include the appropriate MultiVIM plugins in his ONAP deployment. For example, let’s assume they pick private Openstack, private VMWare, and public Azure as the platform to run their services on.

For each MultiVIM plugin, operator configures the following information:

OOF → Multi-VIM/Cloud Policy API - Key Processing Steps

For each cloud owner

OOF → Multi-VIM/Cloud Policy API - Other

SO → Multi-VIM/Cloud 

Policy Management

Optimization

Each service specifies an service-specific objective function that is stored as part of the service-specific policy and is used by OOF to evaluate the candidates. For simplicity of the example, let’s consider service that consists only of one VNF instance. The objective function has two components:

- distance from customer location to the VNF - the service designed assigns a weight for the distance: wd

- the cost of deploying the VNF in a location - the service designer assigns a weight for the cost: wc

OOF optimizes function: min (wd*distance + wc*cost)

If the service does not care about the cost at all, it would set wc = 0. If the service designer wants to minimize cost, he could set wd=0. Note that candidates that are too far can be eliminated by a distance constraint even before the optimization. For example, if the service has a distance constraint of at most 100 kilometers, then only those cloud regions within 100 kilometers to the customer location would be considered in the objective function evaluation.

If the service designer wants to trade off between distance and cost, for example, they might set wd = 1, wc = 2. This would mean that one $1 increase in price is as valuable as 2 kilometers in distance.

Candidate 1: $100, 100 kilometers => value: 300

Candidate 2: $150, 80 kilometers => value: 380

Candidate 3: $50, 190 kilometers => value: 290  <- pick this one 


Cloud Resource Partitioning for Differentiated QoS (Combined with Previous)

Value:

References:

Edge Automation Requirement:

Support three types of slices in the Cloud Infrastructure (Definition Reference: https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/)

Implementation:

References:

Note:


ONAP ComponentLife Cycle PhaseEnhancements
PolicyDesign

Configuration Policies for Guaranteed, Burstable & Best Effort Cloud Infrastructure Resource Slices (this will apply to VMs/Containers also)

Placement Policies for Resource Slices

  • Higher (programmable) weight to Cloud Region which supports all three types of resource slices vs only two types of resource slices (Guaranteed/Best Effort)
Multi-CloudDeployResource Slice Capability Discovery
A&AIDeploy

Resource Slice Capability per Cloud Region

  • Guaranteed/Burstable/Best Effort

Resource Slice Type

  • Guaranteed/Burstable/Best Effort
OOFDeploy

Execute Resource Slice Placement Policies for Optimized Service/VNF Placement across Cloud Regions

Aggregated Infrastructure Telemetry Streams (Aligns with HPA requirements, Combining efforts with HPA)

Value


ONAP, as in R2, collects the statistics/alarms/events from workloads (VMs) and take any close loop control actions such as Heal a process, scale-out, restart etc.. In R3, infrastructure related statistics/alarms/events will be collected, generate actionable insights and take life cycle actions on the workloads.  Infrastructure statistics normally include performance counters, NIC counters, IPMI information on per physical server node basis.  To reduce the load on the ONAP, it is necessary that aggregated (summarized) information is sent to the ONAP from edge-clouds. 

As part of this activity, intention is to create aggregation micro-service that collects the data from physical nodes (over collected and other mechanisms), aggregate the information (time based aggregation, threshold based aggregation, silencing etc.,..) based on the configurable rules and export the aggregate data to DCAE.  This micro service can be instantiated by ONAP itself - one or more instances for edge-clouds at the ONAP-central itself using OOM, it could be instantiated at the edge-cloud using their own deployment tools or it could be deployed edge service providers at the regional site level.  

Impacted projects (development activities)

ONAP ComponentEnhancements
Overall
  • Define models to represent summation information (Alerts/statistics/Events) for various groups
  • Defining various groups such as CPU usage, Memory usage, file descriptor usage, NIC utilization, various HPA features etc...
Multi-Cloud
  • Development activities:
    • Prometheus based monitoring & summation
    • Support for collectd for statistics collection from NFVI nodes.
    • Support for VES agent to send the aggregate data to DCAE (Used when the aggregate service is instantiated outside of ONAP control)
    • Support for DMAAP agent to send the aggregate data to DCAE (Normally used if the aggregate service is instantiated at the ONAP-Central.
    • Provide ability to add new plugins (to collect statistics as well as to export aggregation information)
    • Provide ability to upload the recording and alert rules (on per edge-cloud basis or set of edge-clouds basis)
    • Ability to auto-cleanup of time series DB (based on size allocated for this micro-service)
  • Edge-Cloud registration time (as part of ESR)
    • Check whether registration data indicates whether the aggregation service to be brought up). If so, inform the aggregation micro service to authentication and listen for statistics from that edge-cloud.
  • Run time
    • Collects the information (support for both pull/push).
    • Apply rules
    • Generate alarms
    • Export them via VES or DMAPP or any other plugins in future.
AAI & ESR
  • Development activities
    • Enhancements to ESR to indicate whether aggregation service is required for this edge-cloud at the ONAP.
    • Enhancements to ESR to indicate Multi-Cloud for Multi-Cloud to listen for connections and statistics requests from the edge-clouds. Information such as CA cert to use to authenticate the remote party or any other UN/PWD method.
PORTALESR portal related changes to take information about the edge-cloud (CA Cert and UN/PWD information)
DCAE & DMAPPNone expected??

Life Cycle stages related functions

ONAP ComponentLife cycle phaseActivities
AAI and ESRDeploy & Run time
  • Add/Modify/Delete recording and alerting rules
AAI and ESRRun time
  • Add/Modify/Delete Edge-cloud information
Multi-CloudRun time
  • Get Edge information from A&AI whenever Edge-Cloud is added or removed.
  • Prepare to wait for information from that Edge-cloud
  • Receive information from edge-cloud and put it in the time series DB.
  • Summation based on recording & alerting rules
  • Export information to DCAE via DMAPP or VES

ONAP Edge Analytics with DCAE/DMaaP independent of closed loop (Beyond Casablanca)

Value

ONAP ComponentLife cycle phaseEnhancements
OOM - ONAP CentralDeploy
  • Separate ONAP-edge Instance per 'edge domain', (ie., separate from onap-central instance, of course)
    • Note: Independent of any Edge CP's Orchestration components.
  • SP uses a central-OOM with a 'policy' for deployment of an onap-edge instance, e.g., xyz edge provider with abc components, etc.
    • However, onap-edge instance can be 'lighter weight' with subset of components needed (per MVP discussed below)
    • Desirable to managed as a separate K8s cluster (ie., separate from onap-central instance, of course) and, only for onap-edge use, ie., don't use for other 'workloads' like network apps or 3rd party apps
  • Central OOM to deploy the following ONAP edge instance
    • DMaaP with mirror capability


Multi-Cloud Deployment in Edge Cloud (Stretch Goal)

Value: