Page History

...

#	Issue	Notes	Decision
1	How fast should CPS (and DB) be able to process max heart beat failures?	is 60K really realistic if ENM goes down we should get a notification for each node do we ?!	PoC has shown 60 seconds is reasonable
2	Restart of NCMP	Should/Can this be handled?	As of now, there is no such case is being considered.
3	Does DMI Plugin provide NCMP with a health check URL during registration? Either, just rely on the default one provided with Spring boot actuator?	Document the contract. Its just the interface that matters and not the implementation.	Spring boot actuator interface
4	Error during cmHandle registration	If an error occurs during registration what trustlevel should the cmHandle be set to? IN eth following scenarios When the user has provided an initial trustlevel of 'COMPLETE' (this information could be minuets old!) When the user has provided an initial trustlevel of 'NONE' When the user has NOT provided a (valid) initial trustlevel	Agreed to Leave as is, if notification for a node already registered, we can process the other notification separately 10 Oct 2023
5	Module sync watchdog issues/error scenarios	If cmHandle is set to none/incomplete module sync will automatically retry (Is this acceptable?) If the module sync fails we will still send a Complete message (Is this acceptable?) Registering all cmHandles could take up to 20 mins, what should happen if the last sync fails as the notification would have been sent 20 mins ago?	When CMLevel is in: DELETING/DELETED - No Truslevel notification update ADVISE - No trustLevel notification update READY - Truslevel notification update LOCKED -Truslevel notification update 10 Oct 2023
6	When cm handle trustLevel state stays the same	Do we include that cm handle ID or not for notifications?	No you don't if no changes if it stays the the same 10 Oct 2023

Description

Define scenarios which cause a CM Handle to go stale.
Implement changes to support tracking of CM Handle Freshness/Staleness.

...

#	Interface	Requirement	Additional Information	Sign-off
1	CPS-NCMP-E-05	The 'trustlevel' is visible on all REST methods that currently include the 'cm handle state'	existing endpoints	26 Sep 2023
2	CPS-NCMP-E-05	CM Handles can be queried (filter condition) on 'trustlevel'	using a new 'trustLevel' condition (cannot use cpsPath condition)	26 Sep 2023
3	CPS-NCMP-I-01	During registration, DMI plugin can report initial trustlevel. If the state is not 'complete', it should be considered as 'Trustlevel change' (See req 5)	Initial trust level will be backward compatible if not set, we assume trustlevel is 'complete' For a new cm-handle where the trustlevel is 'complete' this is NOT considered a chance and no notifications should be sent	26 Sep 2023
4	CPS-NCMP-E-05	Once DMI (plugin) is detected to be down the trust-level for all affected CM Handles should be set to be 'NONE'. This wil also lead to many notifcations as per req. #5	this might lead to a high level (20K) of notifications (need to discuss capabilities)	26 Sep 2023
5	CPS-NCMP-E-05.e	NCMP notification shall be sent when the trustlevel changes	Notification be sent externally based on Kafka many small or bulk: Agreed Many notifications, one for each cm-handle	26 Sep 2023
6	CPS-NCMP-I-01.e	It shall be possible to report any trustlevel of one CM Handle DMI plugin can report the current trustLevel of a single cm handle id	i.e. the DMI can tell NCMP the trustLevel is 'NONE' when a node heartbeat failure is detected and 'COMPLETE' once it is restored. Again this should lead to notifications on the external interface as per req #5	26 Sep 2023

Error Handling

#

Error Scenario

Expected behavior

Sign-off

1

NCMP restart (all instances)

To be discussed, not sure if it can/should be handled

TrustLevels should be 'NONE' and need to be restored using an audit-request (not in scope)

If we restart, it should go into COMPLETE STATE. No way of getting out of NONE State

Audit was agreed to be handled in a separate epic - Prioritise audit epic

10 Oct 2023

Characteristics

#	Parameter	Expectation	Notes	Sign-off
1	dmi-down detection speed	30 seconds	60 seconds	It's a configurable value. Agreed - Should be in parallel with device heartbeat.	10 Oct 2023
2	device heartbeat frequency (message emitted by DMI plugin for each device)	60 seconds	Can be removed
3	maximum supported devices (by NCMP)	60,0000	Given #2 and #3 this means NCMP needs to process 60,000 message / minute! - Can be removed, separate epic
4	maximum number of cm-handles down report by DMI in one request and/or per minute	30,000 / minute	a peak can be processed within 60 seconds	10 Oct 2023
5	processing of all trustLevel time for DMI-Down and/or peak load by DMI	1 second	Agreed to go with 30,000 / minute as no 4	10 Oct 2023
6	If we incorporate into searches endpoints the speed should not be impacted	30 seconds	Speed shouldn't be affected - Agreed - It's across 60,0000 cmHandle Open for improvement in respect to performance	10 Oct 2023

Out-of-Scope

This epic will only introduce trustLevel NONE and COMPLETE. PARTIAL and POOR may be added later as below.
Re-registration i.e. resolving trutsLevel degradation is not in scope of this epic

...

Space shortcuts

Page tree

Versions Compared

Old Version 43

New Version 44

Key

Error Handling

Characteristics

Out-of-Scope