Related JIRA:  POLICY-2898 - Getting issue details... STATUS

Proposal: MultiClusterSupport.pptx

Thoughts:

  • Common DB shared across clusters
  • At this point, as PAP is not a bottleneck for event processing, do not need more than one PAP per cluster
  • Run PAP on each cluster in active-active vs active-hot vs active-cold
  • Do PAPs manage PDPs across clusters?
    • Would require cross-cluster DMaaP
  • Is there a way to trigger PAPs, in other clusters, to examine the DB?  (DB triggers maybe?)
  • Separate PDP Groups for each cluster?
    • And possibly multiple groups within a cluster, to support multi-tenancy
    • Maybe don't separate them
  • Would clamp talk to all of the PAPs across the clusters?
    • What can kubernetes do to support this?
  • PDPs can be active-active
    • Can deploy same policy across clusters
  • Prefer a single point for configuring policies
    • Implies a shared DB to store all policies
    • Transactions would be required to prevent conflicting updates by multiple policy-api components
  • How should the consolidated health check work with multiple clusters?  Query a PAP in each cluster?  Query one single PAP?
    • The additional services (e.g., A&AI, DMaaP) may be available in one cluster, but not the other.  How would that be reported?
  • What about pdp-policy deployment status?  Should one query report status for PDPs on all clusters?
    • If so, then that implies that the deployment status is kept in a shared DB
  • Are PAPs aware of PDPs across all clusters?
    • If not, then need a flag in the DB to indicate which PDPs are in which cluster so that PAP doesn't remove PDPs from other clusters
    • If not, then need a way to trigger the PAPs in the other clusters to deploy/undeploy policies to/from their respective PDPs
    • Can we use a shared DMaaP for POLICY-PDP-PAP topic?  Or configure the PAPs in each cluster so they can communicate with the DMaaPs in the other clusters?
  • No labels

1 Comment

  1. Some teams are OK with a global DMaaP and a shared DB.  That should work as is, needing only changes to support multiple PAPs running at the same time.

    Other teams are not OK with a global DMaaP, but are OK with a shared DB, and they're OK with policy deployments "eventually" propagating to all PDPs (e.g., OK if it doesn't happen until a heartbeat).  Because of the lack of a global DMaaP, the PAPs would need a way to know which PDPs they are responsible for.  Possible solutions:

    1. Add a "cluster identifier" field to the PDP tables and the configuration of the PAPs and PDPs; this was not acceptable to all teams
    2. Separate the PDP tables into non-shared, cluster-specific DBs.  Some downsides to this:
      1. Doesn't support a global view of policy deployments; clients would have to query a PAP in each cluster to determine the overall status
      2. PAP could not deny deployment of a policy due to no active PDPs, as there may be active PDPs in another cluster
      3. Requires multiple DB connections, as some requests would go to the shared DB, while others would go to the local DB
    3. Keep expiration time in PDP tables.  When any PAP receives a response from a PDP, it can update the timestamp.  The PAPs would then run a periodic task to remove PDPs whose timestamp has expired.
      1. Heartbeat timers could be eliminated in lieu of the periodic task.  This could simplify the PAP code
      2. If a PDP failed to respond to a request, PAP need not remove it from the DB at that time.  In fact, requests could be done via fire-and-forget, which would greatly simplify the PAP code