You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

References

CPS-2146 - Getting issue details... STATUS

Assumptions

#AssumptionNotes
1Proposed solution is not a quick fix, but allows for future scaling of NCMP.

Issues & Decisions

#IssueNotes Decision
1Java Streams API for CPS and NCMPCPS-2146 Using Java Streams to reduce memory consumption in CPS and NCMP
2Remove Hazelcast from NCMP Module SyncImplementation proposal TBA
3Remove Hazelcast for Trust LevelImplementation proposal TBA
4Remove use of Postgres arrays in Respository methodsCPS-1574: Remove 32K limit from DB operations (Proposal 1)
5Replace Hibernate with JDBC (via Spring Data JDBC)Implementation proposal TBA

<Note. use green for closed issues, yellow for important ones if needed>

Background

CPS and NCMP have much higher memory consumption than required. Regarding NCMP specifically, it has some in-memory data structures that grow linearly with the number of CM-handles.

Regarding CPS-core, there is a more fundamental problem in that CPS path queries could return any amount of data - it will be unknown to the application until a query is executed. Some solutions will be proposed for CPS path queries to reduce memory use.

This study and implementation proposals will target concrete steps to reduce memory consumption.

Analysis

A number of issues leading to high memory usage have been identified.

NCMP CM Handle Queries

NCMP CM Handle Queries are directly implicated in CPS-2146, as the Out Of Memory errors occurs during NCMP Search and ID Search functions.

See CPS-2146 Using Java Streams to reduce memory consumption in CPS and NCMP for analysis & solution to reduce memory consumption during these operations.

Hazelcast

The use of Hazelcast (an In-Memory Data Grid) has been identified as a particular source of high memory usage. Some points of interest:

  • In NCMP, Hazelcast is not used as a cache, so idle eviction is not used, and the structures are configured to have 3 backups. It follows that scaling up the deployment (e.g. Kubernetes auto-scaling) would not help in a low-memory situation, as the new instances would have also be storing the whole structure.
  • Given Hazelcast is configured for synchronous operation, it is likely to have worse performance than a database solution.
  • There are additional reasons to avoid Hazelcast, since as a distributed asynchronous system, it cannot give strong consistency guarantees like an ACID database - it is prone to split brain among other issues.
  • I advise against the use of Hazelcast for future development in NCMP - CPS API should be used.

The following is an overview of Hazelcast structures in CPS and NCMP, along with recommendations.

ComponentHazelcast StructureTypePurposeRecommendationImplementation ProposalNotes
CPSanchorDataCache

Map<String, AnchorDataCacheEntry>


Needs further analysis

NCMPmoduleSyncWorkQueue

BlockingQueue<DataNode>


Remove
TBA

Entire CM handles are stored in work queue for module sync. This creates very high memory usage during CM handle registration. The use of this blocking queue likely causes issues with load balancing during module sync also.

NCMPmoduleSyncStartedOnCmHandles

Map<String, Object>


Remove
TBAOne entry is stored in memory per CM handle in ADVISED state.
NCMPdataSyncSemaphores

Map<String, Boolean>


No immediate action, see notes
Low priority - this map is only populated if data sync is enabled for a CM handle. If the feature is used, it will store one entry per CM handle with data sync enabled.
NCMPtrustLevelPerCmHandle

Map<String, TrustLevel>


Remove
TBAOne entry is stored in memory per CM handle. This is directly implicated in logs supplied in investigation of out-of-memory errors in CPS-2146
NCMPtrustLevelPerDmiPlugin

Map<String, TrustLevel>


Low risk, see notes
Low priority - there are only small number of DMIs, so this structure will not grow so large. However, if trustLevelPerCmHandle is being removed, this structure may be removed as part of the same solution.
NCMPcmNotificationSubscriptionCache

Map<String, Map<String, DmiCmNotificationSubscriptionDetails>>


Will need further analysis in future; see notes
This is low priority, as the CM subscription feature is not fully implemented, thus is not in use. It is unclear how much data will be stored in the structure. It is presumed to be low, as this structure will only hold pending subscriptions.

Use of Postgres Arrays in Repository methods

Use of Postgres arrays in JpaRepository methods may be using too much memory. Though it is currently unclear how much of a contributor this is to Out Of Memory errors, it appears in the logs from CPS-2146.

See CPS-1574: Remove 32K limit from DB operations for history of this implementation choice - an alternate solution using batching was proposed.

For example, from the logs of CPS-2146, see this stack trace:

2024-02-28T05:18:25.049Z@eric-oss-ncmp-04@ncmp@Connection leak detection triggered for org.postgresql.jdbc.PgConnection@b358fc9 on thread qtp1699794502-7604, stack trace follows, logger: com.zaxxer.hikari.pool.ProxyLeakTask, thread_name: CpsDatabasePool housekeeper, stack_trace: java.lang.Exception: Apparent connection leak detected
 org.onap.cps.spi.repository.YangResourceRepository.findAllModuleReferencesByDataspaceAndModuleNames(YangResourceRepository.java:111)
 org.onap.cps.spi.impl.CpsAdminPersistenceServiceImpl.validateDataspaceAndModuleNames(CpsAdminPersistenceServiceImpl.java:206)
 org.onap.cps.spi.impl.CpsAdminPersistenceServiceImpl.queryAnchors(CpsAdminPersistenceServiceImpl.java:143)
 org.onap.cps.api.impl.CpsAnchorServiceImpl.queryAnchorNames(CpsAnchorServiceImpl.java:90)
 org.onap.cps.ncmp.api.impl.inventory.InventoryPersistenceImpl.getCmHandleIdsWithGivenModules(InventoryPersistenceImpl.java:174)
 org.onap.cps.ncmp.api.impl.NetworkCmProxyCmHandleQueryServiceImpl.executeModuleNameQuery(NetworkCmProxyCmHandleQueryServiceImpl.java:167)
 org.onap.cps.ncmp.api.impl.NetworkCmProxyCmHandleQueryServiceImpl.executeQueries(NetworkCmProxyCmHandleQueryServiceImpl.java:256)
 org.onap.cps.ncmp.api.impl.NetworkCmProxyCmHandleQueryServiceImpl.queryCmHandleIds(NetworkCmProxyCmHandleQueryServiceImpl.java:71)
 org.onap.cps.ncmp.api.impl.NetworkCmProxyCmHandleQueryServiceImpl.queryCmHandles(NetworkCmProxyCmHandleQueryServiceImpl.java:95)
 org.onap.cps.ncmp.api.impl.NetworkCmProxyDataServiceImpl.executeCmHandleSearch(NetworkCmProxyDataServiceImpl.java:215)
 org.onap.cps.ncmp.rest.controller.NetworkCmProxyController.searchCmHandles(NetworkCmProxyController.java:253)

The code causing the exception in YangResourceRepository is:

    default Set<YangResourceModuleReference> findAllModuleReferencesByDataspaceAndModuleNames(
        final String dataspaceName, final Collection<String> moduleNames) {
        return findAllModuleReferencesByDataspaceAndModuleNames(dataspaceName, moduleNames.toArray(new String[0]));
    }

Hibernate Entity Cache

Hibernate has an Entity Cache, which can grow large during transactions. While most of CPS-core's Spring JpaRepository methods are using Native SQL, the Entity Manager is still caching in some cases. I propose the removal of Hibernate be investigated as part of a long term solution. (This is not as much work as it sounds: CPS is not directly reliant on Hibernate/JPA - rather Spring Data JPA is used. This could be replaced with Spring Data JDBC with relatively small code changes.)

Note this change is blocked by CPS-1673. The use of OneToMany mapping in FragmentEntity appears to be only place where CPS is currently reliant on functionality provided by JPA.
CPS-1673 - Getting issue details... STATUS

  • No labels