Attendees: Bin Yang Ethan Lynn Gueyoung Jung Matti Hiltunen Bin Hu Huang Haibin Sudhakar Reddy xinhuili


1, Input to re-schedule MC weekly meeting

11am (Which day of week?) Beijing Time (GMT+8) could be the 1st option

Another option is to go with current weekly meeting time slot

               Will set up a poll for this re-scheduling


2, OOF -> MC capacity check :


Background/context:


OOF capacity_check workflow works in the best-effort approach to optimize the placement/homing of a VNF. However, there are numerous reasons/chances that the instantiation of a VNF might fail, the inappropriate placement decision could be just one of them. This issue should be resolved in a bigger context/workflow which handles the failing of a VNF instantiation


Resource Reservation could be an option to secure the placement/homing decision, but the workflow on that front would be complicated as well, OOF/SO/MC/VF-C will be involved to handle the resource reservation/cancelling/commit/etc. So it is not in scope of Dublin yet.



Workflow: https://wiki.onap.org/pages/viewpage.action?pageId=45306917


OOF issues the capacity_check API call once for each VNF (not for each VDU placement)


MultiCloud returns a list of the valid cloud regions, each cloud region will be associated with available capacity info( with AZ names) at AZ level only in case that the available capacity info is available on that cloud region. That implicates for certain cloud region in that list, the  available capacity info at AZ level could be missing. OOF is expected/guaranteed to treat these cloud region as having infinite available capacity on every AZ.


OOF select cloud region and AZ name, pass the AZ name to SO which forward to MC. This is the same approach for passing flavor selection from OOF->SO->MC



API spec:


Request:

The same as current v0/capacity_check

(https://onap.readthedocs.io/en/latest/submodules/multicloud/framework.git/docs/specs/multicloud_resource_capacity_check.html)


Response:

VIM (cloud-region) list with a list of AZ capacity information for each VIM



API version:

Ethan suggest to keep v0/capacity_check be intact, the v1/capacity_check will reflect the changes/upgrades. This could be 1 option to be further discussed.

Bin: I would suggest this v1/capacity_check supports “consistent ID of a cloud region” to identify a VIM by composite keys: {cloud-owner},{cloud-region-id} (instead of {vim-id} in v0/capacity_check).



Azure plugin:

There is no capacity info at Infrastructure level, but still the limit/quota at subscription level could be useful. So Azure plugin will expose that kind of information to OOF as well

  • No labels

11 Comments

  1. Hi Bin Yang ,

    I have following queries could you please clarify ?

    • The check_vim_capacity API input consists of cpu, mem, storage - Is there any API to filter based on HPA attributes, flavor tags ?
    • Also can the filter fine tune based on provider network names / ids, available free pool of floating ips, neutron ports etc ?
    • Is there a reserve API available for reserve the resources for a finite amount of time ?
    • What's the guarantee that the fetched info holds good until a resource allocation request originates from SO ? i.e between the time instance t1 ( capacity check ) and t2 ( instantiation request ) if the resources have been used either by backdoor cloud access or other 3rd party nfv-o / cloud , what's the mitigation plan ?
    • Is there a different APIs available between capability check ( eg is DPDK available etc... ) and capacity check ( eg is enough ram available etc.. ) ?
    • Does this apply to K8S based cloud regions as well ?


    BR,

    Viswa

    1. Hi Viswanath Kumar Skand Priya , thanks for asking.First of all, I would like to touch a little bit of holistic view of interaction between OOF and MC. The interaction between OOF/MC occurs in 2 ways (for now): 1st is capability check via AAI indirectly; 2nd is the capacity check API directly. The indirect interaction allows MC exposes capabilities (which is relatively static) of a cloud region via AAI, OOF will check that for placing a VNF. The workflow is realized as part of HPA functional requirement:  HPA Architecture & Design Considerations ; The 2nd interaction is to allow OOF to query the capacity information of a cloud region which can be very dynamic on run-time.

      It seems that we can follow these queries as Q&A :

      • The check_vim_capacity API input consists of cpu, mem, storage - Is there any API to filter based on HPA attributes, flavor tags ?
        Bin: As mentioned above, the filter/query to HPA attributes/flavor tags is realized indirectly via AAI. Please refer to HPA Policies and Mappings for how the HPA/flavor are mapped between OOF Policy/AAI/OpenStack to support such indirect interaction.
      • Also can the filter fine tune based on provider network names / ids, available free pool of floating ips, neutron ports etc ?
        Bin: Yes and No. These filter will follow HPA approach as well. In Casablanca there is trial to filter based on SRIOV-NIC network: HPA - SR-IOV NIC design by Huang Haibin
      • Is there a reserve API available for reserve the resources for a finite amount of time ?
        Bin: Not yet, as stated in the background/context, it is beyond scope of Dublin release.
      • What's the guarantee that the fetched info holds good until a resource allocation request originates from SO ? i.e between the time instance t1 ( capacity check ) and t2 ( instantiation request ) if the resources have been used either by backdoor cloud access or other 3rd party nfv-o / cloud , what's the mitigation plan ?
        Bin: As stated in background, the resource reservation should help on that front, but not yet realized. There is no mitigation plan since "OF capacity_check workflow works in the best-effort approach to optimize the placement/homing of a VNF", and "This issue should be resolved in a bigger context/workflow which handles the failing of a VNF instantiation"
      • Is there a different APIs available between capability check ( eg is DPDK available etc... ) and capacity check ( eg is enough ram available etc.. ) ?
        Bin: Again, the capability check is done with AAI API.
      • Does this apply to K8S based cloud regions as well ?
        Bin: According to the proposal of K8S plugin of MC, it is supposed to support these requirement as well. I will leave it to Victor Morales for confirmation.


      1. Thanks Bin Yang. One follow-up query. Does this capability / capacity check by MC, require admin privilege of underlying cloud ? Can MC as a normal tenant user, get all these information from cloud ?

        1. Very good question.

          Depends how much detail/accuracy you want. Right now, OpenStack normal tenant user can support this workflow with loss of certain accuracy (since lack of privilege to get some hypervisor information from cloud), neither able to consume the SRIOV-NIC (physical) network.

          On the other hand, admin privilege is too much for a ONAP as a MANO.

          I would suggest to have a MANO role between them for ONAP as a MANO which allows ONAP to query certain privileged information, and to create/change/delete certain privileged resources (provider network). (In case that MC mediate every operation to VIM/Cloud, we can easily enumerate all privileged operations from ONAP and come up with a comprehensive requirement to a MANO role)

          1. That would be good. If you could give me the comprehensive requirement, I can verify with our Ops / Sec folks to see if such role based access can be provided on a prod environment. Based on the response we can fine tune. I believe the ops requirements wouldn't vary much between SPs. I can atleast see how this can work out in our cloud environment.

            1. Sure, I will put it on my TO DO list. Will reach out to you once I have something to share.

      2. Bin Yang Viswanath Kumar Skand Priya These are valid observations on the gaps that exist in the current workflows to evaluate capabilities and capacities. We had in depth discussions on this topic when the API was proposed in Beijing, and following link should give you more details on the thought that went into it at that time: OOF - MultiCloud interaction in R2. Unfortunately, most of the todos from then was deferred to a later release as we had to prioritize other things in Beijing and Casablanca. 

        To add to Bin's note regarding the combined evaluation of capacity and capability, I believe the HPA - Telemetry OOF and A&AI proposal in Casablanca was a first step towards this. 

        1. Thanks Shankar for sharing these information. I would like to follow the progress on HPA - Telemetry OOF and A&AI : Will there be any meeting to drive this requirement?

          1. Bin, While I was certainly involved in the discussions, Srinivasa Addepalli and Dileep Ranganathan have been driving this in Casablanca. They'd be best folks to chart out the plans on this for Dublin. 

          2. Hi Bin,

            We could not complete the work of Prometheus based monitoring across cloud-regions.

            We also came to an understanding that centralized A&AI DB and dynamic update of current capacity information periodically can be a challenge on performance of A&AI DB.

            So, the current plan of action is something like this:

            • Work on distributed monitoring during Dublin time frame.
            • Store the running capacity information in each Edge/Cloud-region location.
            • Provide / Enhance OOF API that gets capacity information.
            • OOF to use above information in determining the HPA telemetry.

            Srini


            1. Hi Srinivasa Addepalli, thanks for the information. I think maintain the capacity near the target (edge for edge clouds) makes more sense and offer further flexibility. Please copy me in the discussion loops so that I can help from both Infrastructure and MultiCloud sides. Thanks and enjoy your vacation