AAI Performance Target to Support Holmes Alarm Processing

References

AAI-2375 - Getting issue details... STATUS
2019-04-11 AAI Developers Meeting agenda item "AAI too slow for Holmes"

Discussion

Guangrong Fu mentioned AAI in Baseline Measurements based on Testing Results:

Cache the AAI data and refresh them periodically so that Holmes won't have to make an HTTP call to AAI every time it tries to correlate one alarm to another.

The problem for caching is how to know when to update the cached data. Even though the access time may be fast for Holmes, the risk is using out-of-date data, so the correlations will be wrong anyway. Also, duplicating the AAI data outside of AAI is probably a bad architectural decision. Making AAI faster for these use cases would be better.

Has there been a performance analysis of where the time is spent? Could it help to use ElasticSearch (e.g. as in sparky)? Should Holmes have a batch interface to get more AAI data in fewer calls? Or a better correlation API that results in fewer calls?

31st Oct: https://lists.onap.org/g/onap-discuss/topic/27805753

1st Nov:

Guangrong Fu will try custom queries for queries that took to long to return
The hardware (mainly storage) influences the query speed - need to find out what hardware was the speed test conducted on (Guangrong Fu will provide HW specs)
HOLMES-186 - Getting issue details... STATUS

Would the AAI Cacher AAI-1337 - Getting issue details... STATUS help to improve performance?

5th Mar: Guangrong Fu

Hi,

Sorry for my late response. It took me a long time to set up AAI in my own env. For Item 10, here's some information:

Main APIs invoked in Holmes for different use cases:

VoLTE

Getting the VM query URL via: /search/nodes-query?search-node-type=vserver&filter=vserver-name:EQUALS: - once
Getting VM info via: the URL returned by the query above - once
Getting the VNF data via: network/generic-vnfs/generic-vnf - once

CCVPN

Updating terminal point via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once
Getting logical links via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - 3 times
Getting VPN bingding info via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once
Getting connectivity info via: /network/vpn-bindings/vpn-binding/{vpnId} - once
Getting service instance info via: /network/connectivities/connectivity/{connectivityId} - once

Performance

We set up an AAI env on a VM (8 cores, 16GB memory, 160GB storage) following the guidance https://wiki.onap.org/display/DW/How+to+Docker+setup+on+Single+VM+HEAT+Deployment and tried to run a VNF query using "/aai/v11/cloud-infrastructure/cloud-regions/cloud-region/example-cloud-owner-val-45051/example-cloud-region-id-val-56689/tenants/tenant/example-tenant-id-val-51834/vservers/vserver/example-vserver-id-val-51834" (which is returned by "/search/nodes-query?search-node-type=vserver&filter=vserver-name:EQUALS:") for 1000 times. It took ~95ms per query. Also, we tried to query a VNF for 1000 times via "/aai/v11/network/generic-vnfs/generic-vnf/example-vnf-id-val-92494" and the average time is ~86ms.

From the result, we know that even for a single request, the time cost reaches around 100ms. Let alone there will be several requests sent to AAI when an alarm is processed by Holmes. Taking CCVPN for example, for each alarm, there are up to 7 requests made. That means it'll take around 600-700 ms for Holmes to interact with AAI. In case of alarm storms, it is hard for AAI to support such intensive queries.

6th March: Guangrong Fu

In my opinion, the performance of AAI queries is not only impacted by the computation inside AAI, but also impacted by the HTTP request itself.

I've done another test. I tried to send requests to the health check API (which does nothing but return immediately after it receives a request ) of Holmes. The average time cost is also ~ 70ms. So it seems to be a problem with the time cost caused by setting up and releasing HTTP connections.

6th March: Keong

Regarding these queries:

Getting logical links via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - 3 times
Getting VPN bingding info via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once

What depth is used on these GET calls? If the defaulting to depth=0, then perhaps some improvement can be made by using "depth=1" or "depth=2"? Fewer calls returning more data could improve overall performance.

Same could be achieved by changing to Nodes query, e.g.

GET /aai/v14/nodes/p-interfaces?interface-name=nodeId-{pnfName}-ltpId-{ifName}

Question1: Can the Bulk API be used with GET calls? Documentation does not show any examples of GET actions. https://onap.readthedocs.io/en/casablanca/submodules/aai/aai-common.git/docs/AAI%20REST%20API%20Documentation/bulkApi.html

Question2: Would it help to have the Holmes pod co-located with the AAI haproxy and AAI resources pods? Reduced network latency could improve overall performance.

Guangrong: Holmes is acutally deployed by DCAE. I'm not sure whether your proposal is feasible. What's more, the performance data I got was based on the fact that Holmes and AAI were deployed on the same VM, sharing the same docker env.

Space shortcuts

Page tree

References

Discussion