Page History

...

PlantUML Macro

@startuml


alt "Deploying the instance"
  activate "ACM Runtime"
  "ACM Runtime" -> "Participant-intermediary" : [ASYNC] Deploying the instance
  deactivate  "ACM Runtime"

  activate "Participant-intermediary"
  activate Participant
  "Participant-intermediary" -> Participant : Create Deploy thread
  deactivate "Participant-intermediary"
  note right
  Deploy thread is stuck
  end note
end

alt "Instance in Timeout"
  activate "ACM Runtime"
  "ACM Runtime" -> "ACM Runtime" : set instance in Timeout
  deactivate  "ACM Runtime"
end

alt "Undeploying the instance"
  activate "ACM Runtime"
  activate "Participant-intermediary"
  "ACM Runtime" -> "Participant-intermediary" : [ASYNC] Undeploying the instance
  deactivate  "ACM Runtime"
  "Participant-intermediary" -> Participant : Terminate Deploy thread
  deactivate Participant
  "Participant-intermediary" -> Participant : Create Undeploy thread
  activate Participant
  deactivate "Participant-intermediary"
  Participant -> "Participant-intermediary" : instance Undeployed
  activate "Participant-intermediary"
  deactivate Participant
  "Participant-intermediary" -> "ACM Runtime" : [ASYNC] instance Undeployed
  deactivate "Participant-intermediary"
end

@enduml

Solutions

Solution 1: Replicas and Dynamic participantId - still using cache

Changes in Participant:

UUID participantId will be generated in memory instead to fetch it in properties file.
consumerGroup will be generated in memory instead to fetch it in properties file.

...

When participant go OFF_LINE:
- if there are compositions connected to that participant, ACM-runtime will find other ON_LINE participant with same supported element type;
- if other ON_LINE participant is present it will change the connection with all compositions and instance;
- after that, it will execute restart for all compositions and instances to the ON_LINE participant.
When receive a participant REGISTER:
- it will check if there are compositions connected to a OFF_LINE participant with same supported element type;
- if there are, it will change the connection with all compositions and instances to that new registered participant;
- after that it will execute restart for all compositions and instances changed.
- Refactor restarting scenario to apply the restarting only for compositions and instances in transition

NoteIssues:

Participants create randomly participantId and Kafka consumerGroup. This solution has been tested and has the issue to create a new Kafka queue in restarting scenario.
During restart scenario, a new consumerGroup is created, that cause some missing initial messages due the creation of new Kafka queue . The result is that to fail to receive messages from ACM to restore compositions and instances.

Solution 2: StatefulSets - still uses cache

Participant replicas can be a kubernetes StatefulSets that consume two different properties file with unique UUIDs and unique consumer groups.

...

Note: In a scenario of two participants in replicas (we are calling "policy-http-ppnt-0" and "policy-http-ppnt-1"), ACM-Runtime will assignee randomly any composition definition in prime time to specific participant based of supported element definition type. So we could have a scenario where a composition definition "composition 1.0.0" is assigned to policy-http-ppnt-0 and the instance too; the new composition "composition 1.0.1" is assigned to policy-http-ppnt-1. In that scenario the migration of an instance from "composition 1.0.0" to "composition 1.0.1" wouldn't work, because policy-http-ppnt-0 do not have "composition 1.0.1" assigned.

Issues:

At migrate time - In that scenario the migration of an instance from "composition 1.0.0" to "composition 1.0.1" wouldn't work, because policy-http-ppnt-0 do not have "composition 1.0.1" assigned. This is a critical issue.

Solution 3: Replicas

...

and Database support - no cache

Changes in Participant:

Redesign TimeOut scenario: Participant has the responsibility to stop the thread in execution after a specific time.
Add client support for database (MariaDB or PostgreSQL).
Add mock database for Unit Tests.
Refactor CacheProvider to ParticipantProvider to support insert/update, intermediary-participant with transactions.
Refactor Intermediary to use insert/update of ParticipantProvider.
Refactor Participants that are using own HashMap in memory (Policy Participant saves policy and policy type in memory)

...

Db migrator will alter old version of the db to add new parts of the schema required by this participant change
Liquibase used for script generation
Separate image needed for DB Migrator - this will have to be released as a new dependency
New Job in kubernetes and new service in docker should be added for this migration

Advantages of DB use

Multiple participant replicas possible - it can deal with messages across many participants
All participants should have same group-id in kafka
All should have the same participant-id.

Solution 4: Distributed Cache

Issues:

Not persistent - if the application that handles cache server restarts - data is lost.
Approval issues - with Redis, Etcd, Search Engine.

Optimal Solution:

After analysis, it is clear that the best solution to use is number 3.

Arbitrary number of participants possible
DB migrator upgrades older versions
Restart scenario not applicable anymore. Could be removed.
Approval not an issue - postgres already used by acm

Space shortcuts

Page tree

Versions Compared

Old Version 23

New Version 24

Key

Solutions

Solution 1: Replicas and Dynamic participantId - still using cache

Changes in Participant:

Solution 2: StatefulSets - still uses cache

Issues:

Solution 3: Replicas

and Database support - no cache

Changes in Participant:

Advantages of DB use

Solution 4: Distributed Cache

Issues:

Optimal Solution: