Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The use of regex potentially results in full table scans, severely limiting performance, and causing query time duration to grow linearly with the database fragment table size (i.e. queries get slower as the DB gets bigger).

A new algorithm for queries is being proposed, avoiding regex, so that query duration is independent of database size.

Objective

To achieve the maximum theoretical performance for queries.

  • For a query that will return 1 out of N items, the minimum theoretical time complexity is constant, O(1).
  • For a query that will return all N items, the minimum theoretical time complexity is linear, O(N).

Complexity of existing solution

As will be demonstrated by performance tests later in this document, the existing solution has the following performance characteristics:

  • For a query that will return 1 out of N items, the current time complexity is linear, O(N).
  • For a query that will return all N items, the current time complexity is quadratic, O(N2).

Proposal

The new approach involves adding a column to the Fragment table, storing the last path component (called xpath_component here). The new column is indexed, to allow constant-time lookups.

...

CaseQuery one out of many using descendant cps pathQuery one out of many using absolute cps path
Query//openroadm-device[@device-id="C201-7-1A-14"]/openroadm-devices/openroadm-device[@device-id="C201-7-1A-19"]
GraphComparison graph

Time complexity of
existing solution

O(N)O(N)
Time complexity of
proposed solution
 O(1)O(1)

As seen in the graphs, query performance for current master branch is linear on the size of the database, while the PoC implementation is constant time (independent of DB size).

(Note, I have not illustrated all different fetchDescendantOptions, as it has only minor impact in this case of fetching 1 device node.

Query

...

all devices

...

using descendant cps path

In this case, a query that matches many device nodes using a descendant cps path is executed:

//openroadm-device[@ne-state="inservice"]
Fetch descendantsOmit Descendants

Direct Descendants

All Descendants

Comparison graph

Graph detail

Time complexity of
existing solution

O(N)O(N2)O(N2)
Time complexity of
proposed solution
O(N)O(N)O(N)

In above cases, it is observed that the existing algorithm grows quadratically - O(n2), e.g. doubling from 1000 to 2000 nodes takes 4 times longer (going from 40 to 160 seconds).

The PoC algorithm grows linearly - O(n), e.g. doubling from 1000 to 2000 nodes takes 2 times longer (going from 3.5 to 7 seconds).

Query

...

all devices

...

using absolute cps path

In this case, a query that matches many device nodes using an absolute cps path is executed:

/openroadm-devices/openroadm-device[@status="success"]
Fetch descendantsOmit DescendantsDirect DescendantsAll Descendants
Comparison graph

Graph detail

Time complexity of
existing solution

O(N)O(N2)O(N2)
Time complexity of
proposed solution
O(N)O(N)O(N)

In above cases, it is observed that the existing algorithm grows quadratically - O(n2), while the PoC algorithm grows linearly - O(n).

...