2019-05-13: Original post

jhartley / at - luminanetworks \ com – for any questions


Goals for akka.conf:

  1. Specify THIS cluster member's resolvable FQDN or IP address. (Tip: Use FQDNs, and ensure they're resolvable in your env.)
  2. Name the list of all cluster members in the seed-nodes list.
  3. Tune optional variables, noting that the defaults for many of these are far too low.
  4. Keep this file ~identical on all instances; only the "roles" and "hostname" are unique to this member.


Example of a 3-node configuration, tuned:

odl-cluster-data {
  akka {
    loglevel = ""
    remote {
      netty.tcp {
        hostname = "odl1.region.customer.com"
        port = 2550
      },
      use-passive-connections = off
    }
    actor {
      debug {
        autoreceive = on
        lifecycle = on
        unhandled = on
        fsm = on
        event-stream = on
      }
    }
    cluster {
      seed-nodes = [
        "akka.tcp://opendaylight-cluster-data@odl1.region.customer.com:2550", 
        "akka.tcp://opendaylight-cluster-data@odl2.region.customer.com:2550", 
        "akka.tcp://opendaylight-cluster-data@odl3.region.customer.com:2550"
      ]
      seed-node-timeout = 15s
      roles = ["member-1"]
    }
    persistence {
      journal-plugin-fallback {
        circuit-breaker {
          max-failures = 10
          call-timeout = 90s
          reset-timeout = 30s
        }
        recovery-event-timeout = 90s
      }
      snapshot-store-plugin-fallback {
        circuit-breaker {
          max-failures = 10
          call-timeout = 90s
          reset-timeout = 30s
        }
        recovery-event-timeout = 90s
      }
    }
  }
}


Goals for org.opendaylight.controller.cluster.datastore.cfg:

  1. This is a HOCON-style config file, so subsequent entries replace earlier entries.
  2. The goal here is to significantly reduce the race-condition that is present when starting all members of a cluster, and the race-condition any freshly restarted or "cleaned" member has when rejoining.


### Note: Some sites use batch-size of 1, not reflecting that here###
persistent-actor-restart-min-backoff-in-seconds=10
persistent-actor-restart-max-backoff-in-seconds=40
persistent-actor-restart-reset-backoff-in-seconds=20
shard-transaction-commit-timeout-in-seconds=120
shard-isolated-leader-check-interval-in-millis=30000
operation-timeout-in-seconds=120


Goals for module-shards.conf:

  1. Name which members retain copies of which data shards.
  2. These shard name fields are the 'friendly" names assigned to the explicit namespaces in the modules.conf.
  3. In a K8S/Swarm environment, it's easiest to keep this identical on all members.  Unique shard replication (or isolation) strategies are for another document/discussion, and require non-trivial planning.


module-shards = [
    {
        name = "default"
        shards = [
            {
                name="default"
                replicas = [
                    "member-1"
                    "member-2"
                    "member-3"
                ]
            }
        ]
    },
    {
        name = "topology"
        shards = [
            {
                name="topology"
                replicas = [
                    "member-1"
                    "member-2"
                    "member-3"
                    ]
            }
        ]
    },
    {
        name = "inventory"
        shards = [
            {
                name="inventory"
                replicas = [
                    "member-1"
                    "member-2"
                    "member-3"
                    ]
            }
        ]
    },
]

...thus, for example, it would be legitimate to have a single simple entry that ONLY includes "default" if desired.  Thus there would only be default-config and default-operational, plus some of the auto-created shards.




  • No labels