Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Assumptions


AssumptionNotes 
1When a DMI restarts all cm-handles related to that DMI will consider to have trust-level COMPLETETemporary assumption until the 'Audit' function has been implemented

Issues and Decisions

how
#IssueNotesDecision
1How fast should CPS (and DB) be able to process max heart beat failures?is 60K really realistic if ENM goes down we should get a notification for each node do we ?!PoC has shown 60 seconds is reasonable
2Restart of NCMPshouldShould/can Can this be handled?As of now, there should no be re-registration flowis no such case is being considered.
3Does DMI Plugin provide NCMP with a health check URL during registration? Or Either, just rely on the default one provided with Spring boot actuator?Document the contract. Its just the interface that matters and not the implementation.Spring boot actuator interface4Look for the dmi data service (dmiDataPlugin) for the healthcheck


Description

  1. Define scenarios which cause a CM Handle to go stale.
  2. Implement changes to support tracking of CM Handle Freshness/Staleness.

What might trigger a cmHandle to go to STALE?

  1. dmi plugin identifies that the device is no longer contactable.
  2. dmi plugin identifies that an underlying device manager managing the device (node) is out of sync with the device itself. 

...

#InterfaceRequirementAdditional InformationSign-off
1CPS-NCMP-E-05The 'trustlevel' is visible on all REST methods that currently include the 'cm handle state'existing endpoints 

 

2CPS-NCMP-E-05CM Handles can be queried (filter condition) on  'trustlevel'  

using a new 'trustLevel' condition (cannot use cpsPath condition)

 

3CPS-NCMP-I-01

During registration, DMI plugin can report initial trustlevel.

If the state is not 'complete', it should be considered as 'Trustlevel change' (See req 5)

Initial trust level will be backward compatible if not set, we assume trustlevel is 'complete'

For a new cm-handle where the trustlevel is 'complete' this is NOT considered a chance and no notifications should be sent

  

4CPS-NCMP-E-05Once DMI (plugin) is detected to be down the trust-level for all affected CM Handles should be set to be 'NONE'. This wil also lead to many notifcations as per req. #5

this might lead to a high level (20K) of notifications 

(need to discuss capabilities)

 

5CPS-NCMP-E-05.e

NCMP notification shall be sent when the trustlevel changes

Notification be sent externally based on Kafka 

many small or bulk: Agreed Many notifications, one for each cm-handle

  

6CPS-NCMP-I-01.e

It shall be possible to report any trustlevel of one CM Handle

DMI plugin can report the current trustLevel of a single cm handle id

i.e. the DMI can tell NCMP the trustLevel is 'NONE' when a  node heartbeat failure is detected and 'COMPLETE' once it is restored.
Again this should lead to notifications on the external interface as per req #5 

 

Error Handling

#Error ScenarioExpected behaviorSign-off
1NCMP restart (all instances)

To be discussed, not sure if it can/should be handled

TrustLevels should be 'NONE' and need to be restored using an audit-request (not in scope)


Characteristics

#ParameterExpectationNotesSign-off
1dmi-down detection speed30 seconds

2device heartbeat frequency (message emitted by DMI plugin for each device)60 seconds

3maximum supported devices (by NCMP)60,0000Given #2 and #3 this means NCMP needs to process 60,000 message / minute!
4maximum number of cm-handles down report by DMI in one request and/or per minute30,000 / minute a peak can be processed within 60 seconds
5processing of all trustLevel time for DMI-Down and/or peak load by DMI 1 second

6If we incorporate into searches endpoints the speed should not be impacted


...

  1. This epic will only introduce trustLevels trustLevel NONE and COMPLETE. PARTIAL and POOR may be added later as below.
  2. Re-registration i.e. resolving trutsLevel degradation is not in scope of this epicNCMP will not send notification on trustLevel changes for external consumers

High Level Interactions

draw.io Diagram
bordertrue
diagramNameStaleness Freshness Overview
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth939
revision6

InterfaceNameTriggerDescriptionTypeEndpoint or TopicSchema
1HealthCheck30 second interval (configurable)NCMP is to perform a health check against each of the DMI PluginsREST

http://<dmiPluginServiceName>/manage/health

This endpoint will be the standard heath check endpoint provided by spring boot actuator. We don't store it anywhere. We just document it for now.


2CMHandle trust level changeA CMHandle managed by DMI Plugin's trust level has changed

data contains {trustLevel: ENUM} 

event id is cmhandle id in kafka header

Kafka

kafka topic:

dmi-device-heartbeat

<cloudEvents-header>

  id : <cmhandleId>

  type : org.onap.cm.events.trustlevel-notification

data : {
      trustlevel : "COMPLETE/NONE"
}

3TrustLevel RequestCMHandle Query API with trustLevel Query ConditionClient RequestTrustLevel

CmHandle is to be returned based on the values in above

Maps

CMHandle Trust Map

REST
  1. http://<host>:<port>/ncmp/v1/ch/id-searches
  2. http://<host>:<port>/v1/ch/searches 

{
  "cmHandleQueryParameters": [
    {
        "conditionName""cmHandleWithTrustLevel",
        "conditionParameters": [ {"trustLevel""COMPLETE"} ]
    }
  ]
}

Managing

...

TrustLevel

DMI Plugins

  1. NCMP is checking every DMI Plugin for health at interface 1 every 30 seconds using the DMI Trust Map
  2. IF a DMI Plugin goes down, that DMI Plugin's trust level is updated to NONE in the DMI Trust Map
    1. The CM handles corresponding to DMI should be set to NONE.
  3. IF a DMI Plugin comes back up, Trust level is set back to COMPLETE for that DMI plugin only.

    More details of health check URL can be accessed via:
    CPS-1857 Document watchdog job impl. with health check URL

...

CMHandle Heartbeat

  1. It is the responsibility of the DMI Plugins to update NCMP about the HBs heartbeat of CMHandlesCMHandle.
  2. Through interface 2, DMI Plugins will provide a kafka Kafka event on the changing of trustworthiness state of a CMHandle.
    1. NCMP receives this event and updates the Untrustworthy CMHandles Set accordinglyCM Handle Trust Map accordingly
  3. Needs to be able to handle a throughput of 60,000 State changes per minute for 2 instances

...

  1. Body of request will be in the format as below:

    Code Block
    languagetext
    titleSearch Trust Level Request Body
    {
      "cmHandleQueryParameters": [
        {
            "conditionName": "cmHandleWithTrustLevel",
            "conditionParameters": [ {"trustLevel": "COMPLETE"} ]
        }
      ]
    }


    There are two end points will be subject to query:
    http://<host>:<port>/ncmp/v1/ch/id-searches
    http://<host>:<port>/v1/ch/searches 

  2. Interface 3
  3. NCMP will first check trust level query parameters to determine which trust level (NONE, COMPLETE) is being searched.
    1. if the target trust level is NONE
      1. The cm handles stored in untrustworthyCmHandleSet CM Handle Trust Map having NONE will be returned.
    2. if the target trust level is COMPLETE
      1. If that DMI which is managing the CMHandle is marked as untrustworthy then we return NONE
      2. If that DMI is trustworthy, the cm handles for that DMI The cm handles stored in CM Handle Trust Map having COMPLETE will be returned.