Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The current implementation of Cps Path queries relies on regular expressions in the generated SQL queries.

SQL query using regular expression on the full path

For example, given this Cps Path:

...

A new solution is being proposed where each path component of the Cps Path is looked up in constant time, thus addressing performance issues.

Objective

...

The objective is to achieve the maximum best possible performance for queries.

  • For a query that will return 1 out of N items, the best possible time complexity is constant, O(1).
  • For a query that will return all N items, the best possible time complexity is linear, O(N).

Implementation Proposal

An additional fragment table field is needed to store path components

The new approach involves adding a column to the Fragment table, storing the last path component (called xpath_component here). The new column is indexed, to allow constant-time lookups.

idparent_idanchor_idxpathxpath_component
1NULL3/bookstorebookstore
213/bookstore/categories[@code='1']categories[@code='1']
323/bookstore/categories[@code='1']/books[@title='Matilda']books[@title='Matilda']
423/bookstore/categories[@code='1']/books[@title='The Gruffalo']books[@title='The Gruffalo']

SQL query for path-component lookup

For example, given this CPS Path:

...

In addition, a new algorithm for fetching descendants is implemented.

Fetching descendants using Recursive SQL query

...

The proposed solution will use recursive SQL to fetch descendants of fragment returned from the CPS path query:

...

Existing solutionProposed solution

Because the existing solution uses a regex, all nodes in anchor 3 need to be examined to see if they match the regex.

Because the proposed solution uses sub-queries to look up each path component, only relevant nodes are examined.

  • Green and yellow nodes have been checked against the regex.
  • Yellow nodes matched the regex, and thus have JSON attributes examined.

  • Green nodes have been looked up using an index-only lookup.
  • Yellow nodes have had JSON attributes examined.

If the number of nodes is doubled, the number of xpaths checked against the regex is also doubled (linear complexity).

If the number of nodes is doubled, the same number of nodes is still scanned in this case (constant complexity).

Proof of Concept

A proof of concept (poc) implementation was developed so that performance could be compared against existing Cps Path query algorithms.

...

The test timings used to generate the charts in this document is available as a spreadsheet: CPS-1646 CpsPath queries using path-component lookup.xlsx

Detailed Performance Comparison

Query one device out of many (1 out of N)

...

Fetch descendantsOmit DescendantsDirect DescendantsAll Descendants
Query/openroadm-devices/openroadm-device[@status="success"]/openroadm-devices/openroadm-device[@status="success"]/openroadm-devices/openroadm-device[@status="success"]
Comparison graph

Graph detail

Time complexity of
existing solution

Unclear, as there is very high variance. Likely O(N).O(N2)O(N2)
Time complexity of
proposed solution
O(N)O(N)O(N)

Possible improvements

...

Cps Path Query capabilities can be extended

...