...
Jira | Notes | Decision | Status | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
| Managed by Daniel Hanrahan See short term solution below | ||||||||||||||
2 |
| Very likely related to #1 | Won't investigate separately, apply short term solutions mentioned below to #1 and test again | Separate fix but probably will contribute to #1 too. CPS Team can close this once deployment documentation has been updated to reflect this | ||||||||||||
3 |
| Won't investigate separately, apply short term solutions mentioned below to #1 and test again | Not reproducible. Doesn't seem to be a NCMP Server issue, posisble just a once-off general r(networking) resource issue. Will be closed | |||||||||||||
4 |
| Indirectly related to #1 | Ticket relates to an incorrect timeout limit, not the timeout itself. | Managed by Priyank Maheshwari solution currently being tested | ||||||||||||
5 |
| Not related | Investigated by Levente Csanyi |
...
Issue | Notes | Decision | ||
---|---|---|---|---|
1 | Increase memory resources of NCMP (helm chart) | Memory resources of CPS/NCMP pod should be increased to 4GB, 5GB, etc. to determine if the OOME for CPS-2146 is fixed. | Csaba Kocsis ETH will test and report back to CPS | |
2 |
| ETH has already implemented fix. | ||
3 | NCMP will implement throttling / rate limiting for Rest API (e.g. 503 HTTP response) | Requires determining maximum request rate, e.g. compare previous successful versus failing tests (e.g. 3.4.2 vs 3.4.6) to determine throttling. A poc of rate limiting has been created: 20747: |
| |
4 | Rest client (for load tests) will throttle | Depend on outcome of | #4#3 above | |
5 | Lower thread count for Module Sync | This can be done using variable NCMP_MODULES_SYNC_WATCHDOG_ASYNC_EXECUTOR_PARALLELISM_LEVEL (default 10) | Csaba Kocsis ETH will test and report back to CPS | |
6 |
| Hazelcast is configured to have multiple backups which are not needed in a deployment with only 2 NCMP instances (2 instances requires only 2 copies across the cluster). Testing has shown that having appropriate amount of backups to suit cluster size reduces heap usage by around 100MB during 20K CM handle registration. | Daniel Hanrahanhas provided a patch to reduce memory consumption: https://gerrit.onap.org/r/c/cps/+/137517 |
Background
CPS and NCMP have much higher memory consumption than required. Regarding NCMP specifically, it has some in-memory data structures that grow linearly with the number of CM-handles.
...
One avenue worth further investigation is a series of recent performance improvements to CPS and NCMP introduced around 3.4.2:
Version | Jira | Comment | Example performance test | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3.4.2 |
| Improved time performance of CPS store operations (2x or more). | org.onap.cps.integration.performance.cps.WritePerfTest#Writing openroadm data has linear time. | ||||||||||||
3.4.3 |
| Improved time performance of CPS update operations (2x in some cases, stacks with CPS-1795)cases, stacks with CPS-1795). | org.onap.cps.integration.performance.cps.UpdatePerfTest#Replace single data node and descendants: #scenario. | ||||||||||||
3.4.3 |
| Improved time performance of saving CM handles (over 4x faster, stacks with CPS-1795). | See https://gerrit.onap.org/r/c/cps/+/136932 The code was changed to remove the slower API, and production code uses the 4x faster APIImproved time performance of saving CM handles (over 4x faster, stacks with CPS-1795). | ||||||||||||
3.4.3 |
| Improved time performance of CPS queries (5-10x). | org.onap.cps.integration.performance.ncmp.CmHandleQueryPerfTest#CM-handle is looked up by alternate-id. | ||||||||||||
3.4.6 |
| Removed Spring Security, which greatly reduced overhead on Rest requests (over 10x). | K6 test will be added as part of CPS-1975 |
Cumulatively, both read and write speeds are up to 10x faster than previous versions, and overhead on Rest requests is over 10x lower. It is very possible that these improvements are adversely affecting memory usage during load tests.
...