Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Error detection is very fast: less than 1 second
  • Recovery:
    • Kill docker container, it normally takes less than 1 minute to get the system in normal state. (SDNC, APPS APPC will take up 5 minutes)
    • Delete the pod, it normally takes much longer to get back specially for SDNC, APPC (up to 15 minutes). 
  • Note: Helm upgrade sometimes messed up the whole system, which will turn the system into un-useable status. However, we think this may not be a normal use case for production env.

...

Area

Priority

Min. Level

Stretch Goal

Level Descriptions (Abbreviated)

Resiliency

High

Level 2 – run-time projects
Level 1 – remaining projects

Level 3 – run-time projects
Level 2 – remaining projects

•1 – manual failure and recovery (< 30 minutes)
•2 – automated detection and recovery (single site) (<30 miutesminutes)
•3 – automated detection and recovery (geo redundancy)