...
The use of regex potentially results in full table scans, severely limiting performance, and causing query time duration to grow linearly with the database fragment table size (i.e. queries get slower as the DB gets bigger).
A new algorithm for queries is being proposed, avoiding regex, so that query duration is independent of database size.
Objective
To achieve the maximum theoretical performance for queries.
- For a query that will return 1 out of N items, the minimum theoretical time complexity is constant, O(1).
- For a query that will return all N items, the minimum theoretical time complexity is linear, O(N).
Complexity of existing solution
As will be demonstrated by performance tests later in this document, the existing solution has the following performance characteristics:
- For a query that will return 1 out of N items, the current time complexity is linear, O(N).
- For a query that will return all N items, the current time complexity is quadratic, O(N2).
Proposal
The new approach involves adding a column to the Fragment table, storing the last path component (called xpath_component here). The new column is indexed, to allow constant-time lookups.
...
Case | Query one out of many using descendant cps path | Query one out of many using absolute cps path |
---|---|---|
Query | //openroadm-device[@device-id="C201-7-1A-14"] | /openroadm-devices/openroadm-device[@device-id="C201-7-1A-19"] |
GraphComparison graph | ||
Time complexity of | O(N) | O(N) |
Time complexity of proposed solution | O(1) | O(1) |
As seen in the graphs, query performance for current master branch is linear on the size of the database, while the PoC implementation is constant time (independent of DB size).
(Note, I have not illustrated all different fetchDescendantOptions, as it has only minor impact in this case of fetching 1 device node.
Query
...
all devices
...
using descendant cps path
In this case, a query that matches many device nodes using a descendant cps path is executed:
//openroadm-device[@ne-state="inservice"]
Fetch descendants | Omit Descendants | Direct Descendants | All Descendants |
---|---|---|---|
Comparison graph | |||
Graph detail | |||
Time complexity of | O(N) | O(N2) | O(N2) |
Time complexity of proposed solution | O(N) | O(N) | O(N) |
In above cases, it is observed that the existing algorithm grows quadratically - O(n2), e.g. doubling from 1000 to 2000 nodes takes 4 times longer (going from 40 to 160 seconds).
The PoC algorithm grows linearly - O(n), e.g. doubling from 1000 to 2000 nodes takes 2 times longer (going from 3.5 to 7 seconds).
Query
...
all devices
...
using absolute cps path
In this case, a query that matches many device nodes using an absolute cps path is executed:
/openroadm-devices/openroadm-device[@status="success"]
Fetch descendants | Omit Descendants | Direct Descendants | All Descendants |
---|---|---|---|
Comparison graph | |||
Graph detail | |||
Time complexity of | O(N) | O(N2) | O(N2) |
Time complexity of proposed solution | O(N) | O(N) | O(N) |
In above cases, it is observed that the existing algorithm grows quadratically - O(n2), while the PoC algorithm grows linearly - O(n).
...