The current ACM state machine works but it is incosistent in the way it handles error states or failed transitions. A composition and its elements can get "stuck" in transition states.
We need to
- Specify what the current state machine is for both compositions and elements and describe what the state machine for both should be
- Specify what the behaviour of the runtime and participants should be in each state
- Specify what the behaviour should be for the runtime and participants should be in transitions
Specifically we need to clarify:
- State of the composition elements
- State of the overall composition is derived from the composition element states
- Admin state/Running state
- When all the elements are fully up and configured, the go to state Passive, when all elements are in Passive, the full composition goes to Passive
- Error states: Are they parallel sates or part of the same state?
- There should “it didn’t work” states like “Passive-Error” or “Run_Error” (names to be decided later)
- Describe what the “Running” state means and what the participant should do in Passive->Running and Running->Passive transitions.
- Say a K8S service crashes, how do we feed that back? Running_Error. The state of the POD is only checked during startup. It is not periodically checked. There should be supervision.
State Machine for Automation Compositions
Current State Machine
- Composition in UNINITIALIZED state: all elements of a composition are in UNINITIALIZED state, all applications are not deployed and policy types are not deployed and not present in Api.
- User triggers to move the composition from UNINITIALIZED to PASSIVE: runtime-acm moves elements from UNINITIALIZED state to UNINITIALIZED_TO_PASSIVE.
- Element in UNINITIALIZED_TO_PASSIVE:
- participant-ks8: deploys applications
- participant-policy: creates policy types in Api and deploys them with Pap.
- participant-http: configures applications.
- Element in PASSIVE state:
- participant-ks8: applications are deployed.
- participant-policy: policy types are create in Api and deployed with Pap.
- participant-http: applications are configured.
- Composition in PASSIVE state: all elements are in PASSIVE state, all applications are deployed and configured.
- User triggers to move the composition from PASSIVE to UNINITIALIZED: runtime-acm moves elements from PASSIVE state to PASSIVE_TO_UNINITIALIZED.
- Element in UNINITIALIZED_TO_PASSIVE:
- participant-ks8: undeploys applications
- participant-policy: undeploys policy types with Pap and deletes them in Api.
- participant-http: do nothing
- Element in UNINITIALIZED state:
- participant-ks8: applications are undeployed.
- participant-policy: policy types are not deployed and not present in Api.
Proposed State Machine
State Machine for Automation Composition Elements
Current State Machine
TBC
Proposed State Machine
Proposed State Machine
- Composition in UNINITIALIZED state: all elements of a the composition are in UNINITIALIZED state, all applications are not deployed and policy types are not deployed and not present in Api.
- User triggers to move the composition from UNINITIALIZED to PASSIVE: runtime-acm moves elements from UNINITIALIZED state to UNINITIALIZED_TO_PASSIVE.
- Element in UNINITIALIZED_TO_PASSIVE:
- participant-ks8: deploys applications
- participant-policy: creates policy types in Api and deploys them with Pap.
- participant-http: checks if applications are healthy.
- Element in UNINITIALIZED_TO_PASSIVE_ERROR state: participant got error during deploy.
- Composition in UNINITIALIZED_TO_PASSIVE_ERROR state: at least one element is in UNINITIALIZED_TO_PASSIVE_ERROR state.
- User can re-try UNINITIALIZED_TO_PASSIVE.
- User can go back to UNINITIALIZED.
- Element in PASSIVE state:
- participant-ks8: applications are deployed.
- participant-policy: policy types are create in Api and deployed with Pap.
- participant-http: applications are healthy but not configured yet.
- Composition in PASSIVE state: all elements are moved to PASSIVE, all applications are deployed but not configured.
- User triggers to move the composition from PASSIVE to RUNNING: runtime-ACM moves elements from PASSIVE state to PASSIVE_TO_RUNNING.
- Element in PASSIVE_TO_RUNNING state:
- participant-ks8: do nothing (maybe checks if applications are running).
- participant-policy: do nothing (maybe checks if policy types are running).
- participant-http: configures applications.
- Element in PASSIVE_TO_RUNNING_ERROR state: participant got error during configuration.
- Composition in PASSIVE_TO_RUNNING_ERROR state: at least one element is in PASSIVE_TO_RUNNING_ERROR state.
- Element in RUNNING state:
- participant-ks8: applications are deployed (periodically checks if applications are running).
- participant-policy: policy types are create in Api and deployed with Pap.
- participant-http: applications are healthy and configured (periodically checks if applications are healthy).
- Composition in RUNNING state: all elements of a ACM are in RUNNING state, all applications are running.
- Element in RUN_ERROR state: participant got error during running state (it periodically checks if applications are running).
- Composition in RUN_ERROR state: at least one element is in RUN_ERROR state
- User could decide to move the composition from RUN_ERROR to PASSIVE state.
- Application has been restarted by kubernetes, Participant detects that the application is running and move the element from RUN_ERROR to RUNNING.
- User triggers to move the composition from RUNNING to PASSIVE: runtime-acm moves elements from RUNNING state to RUNNING_TO_PASSIVE.
- Element in RUNNING_TO_PASSIVE:
- participant-ks8: do nothing
- participant-policy: do nothing
- participant-http: remove configuration
- Element in RUNNING_TO_PASSIVE_ERROR state: participant got error during removing configuration
- Composition in RUNNING_TO_PASSIVE_ERROR state: at least one element is in RUNNING_TO_PASSIVE_ERROR state.
- User triggers to move the composition from PASSIVE state to UNINITIALIZED: runtime-acm moves elements from PASSIVE state to PASSIVE_TO_UNINITIALIZED.
- Element in PASSIVE_TO_UNINITIALIZED:
- participant-ks8: undeploys applications
- participant-policy: undeploys policy types with Pap and deletes them in Api.
- participant-http: do nothing
- Element in PASSIVE_TO_UNINITIALIZED_ERROR state: participant got error during undeployment
- Composition in PASSIVE_TO_UNINITIALIZED_ERROR state: at least one element is in PASSIVE_TO_UNINITIALIZED_ERROR state.
- Element in UNINITIALIZED state:
- participant-ks8: applications are undeployed.
- participant-policy: policy types are not deployed and not present in Api.
- In any Error status the User can re-try the operation.
Note:
Whit this solution, User can move from RUNNING to PASSIVE, update the service template related to the configuration (participant-http) when applications are still up, and after move from PASSIVE to RUNNING.
Second Proposed State Machine
- Composition in UNINITIALIZED state: all elements of a the composition are in UNINITIALIZED state, all applications are not deployed and policy types are not deployed and not present in Api.
- User triggers to move the composition from UNINITIALIZED to PASSIVE: runtime-acm moves elements from UNINITIALIZED state to UNINITIALIZED_TO_PASSIVE.
- Element in UNINITIALIZED_TO_PASSIVE:
- participant-ks8: deploys applications
- participant-policy: creates policy types in Api and deploys them with Pap.
- participant-http: configures applications.
- Element in UNINITIALIZED_TO_PASSIVE_ERROR state: participant got error during deploy.
- Composition in UNINITIALIZED_TO_PASSIVE_ERROR state: at least one element is in UNINITIALIZED_TO_PASSIVE_ERROR state.
- User can re-try UNINITIALIZED_TO_PASSIVE.
- User can go back to UNINITIALIZED.
- Element in PASSIVE state:
- participant-ks8: applications are deployed.
- participant-policy: policy types are create in Api and deployed with Pap.
- participant-http: applications are configured.
- Composition in PASSIVE state: all elements are moved to PASSIVE, all applications are deployed and configured. Runtime-ACM automatically moves the composition from PASSIVE to RUNNING: runtime-ACM moves elements from PASSIVE state to PASSIVE_TO_RUNNING.
- Element in PASSIVE_TO_RUNNING state:
- participant-ks8: starts monitoring if applications are running.
- participant-policy: do nothing (maybe starts monitoring if policy types are running).
- participant-http: starts monitoring if applications are healthy.
- Element in PASSIVE_TO_RUNNING_ERROR state: participant got error during configuration.
- Composition in PASSIVE_TO_RUNNING_ERROR state: at least one element is in PASSIVE_TO_RUNNING_ERROR state.
- Element in RUNNING state:
- participant-ks8: monitoring if applications are running.
- participant-policy: do nothing (maybe monitoring if policy types are running).
- participant-http: monitoring if applications are healthy.
- Composition in RUNNING state: all elements of a ACM are in RUNNING state, all applications are running.
- Element in RUN_ERROR state: participant got error during running state (it periodically checks if applications are running).
- Composition in RUN_ERROR state: at least one element is in RUN_ERROR state
- User could decide to move the composition from RUN_ERROR to PASSIVE state.
- Application has been restarted by kubernetes, Participant detects that the application is running and move the element from RUN_ERROR to RUNNING.
- User triggers to move the composition from RUNNING to PASSIVE: runtime-acm moves elements from RUNNING state to RUNNING_TO_PASSIVE.
- Element in RUNNING_TO_PASSIVE:
- participant-ks8: stop monitoring
- participant-policy: stop monitoring
- participant-http: stop monitoring
- User triggers to move the composition from PASSIVE state to UNINITIALIZED: runtime-acm moves elements from PASSIVE state to PASSIVE_TO_UNINITIALIZED.
- Element in PASSIVE_TO_UNINITIALIZED:
- participant-ks8: undeploys applications
- participant-policy: undeploys policy types with Pap and deletes them in Api.
- participant-http: do nothing
- Element in PASSIVE_TO_UNINITIALIZED_ERROR state: participant got error during undeployment
- Composition in PASSIVE_TO_UNINITIALIZED_ERROR state: at least one element is in PASSIVE_TO_UNINITIALIZED_ERROR state.
- Element in UNINITIALIZED state:
- participant-ks8: applications are undeployed.
- participant-policy: policy types are not deployed and not present in Api.
- In any Error status the User can re-try the operation.
Note:
Whit this solution, User can move from RUNNING to PASSIVE, update the service template related to the configuration (participant-http) when applications are still up, and after move from PASSIVE to RUNNING.