Background

The current implementation of Cps Path queries relies on regular expressions in the generated SQL queries.

...

A new algorithm for queries is being proposed, avoiding regular expressions, so that query duration is independent of database size.

Path queries using path-component lookup

Objective

The objective is to achieve the maximum theoretical performance for queries.

For a query that will return 1 out of N items, the best theoretical time complexity is constant, O(1).
For a query that will return all N items, the best theoretical time complexity is linear, O(N).

Complexity of existing solution

As will be demonstrated by performance tests later in this document, the existing solution has the following performance characteristics:

For a query that will return 1 out of N items, the current time complexity is linear, O(N).
For a query that will return all N items, the current time complexity is quadratic, O(N²).

Operation	Existing solution	Proposed solution
Query 1 out of N	O(N)	O(1)
Query all out of N	O(N²)	O(N)

Implementation Proposal

The new approach involves adding a column to the Fragment table, storing the last path component (called xpath_component here). The new column is indexed, to allow constant-time lookups.

...

Code Block

language	sql

SELECT
    * 
FROM
    fragment 
WHERE
    parent_id IN (
        SELECT
            id 
        FROM
            fragment 
        WHERE
            xpath_component = 'categories[@code=''1'']'
            AND parent_id = (
                SELECT
                    id 
                FROM
                    fragment 
                WHERE
                    xpath_component = 'bookstore'
                    AND anchor_id = 3
                    AND parent_id IS NULL
            )
        ) 
        AND (
            xpath_component = 'books'
            OR xpath_component LIKE 'books[%'
        ) 
        AND (
            attributes @> '{"title":"Matilda"}'
        )

Design decision: should xpath_component contain the leaf key?

Given this Cps Path:

/bookstore/categories[@code="1"]/books[@title="Matilda"]

...

Storing only 'books' will allow for removing the 'LIKE' operator from the SQL query, potentially speeding queries in the general case.
With further development, this could allow for GET operation (getDataNodes) to return whole lists, using an index-only lookup. (Though given how fast proposed query solution is, the existence of the GET operation is questionable.)
Storing "books[@title='Matilda']" will allow for optimization of queries where the leaf-condition references the key leaf (as defined in the Yang model), thus skipping both the LIKE operator and the attribute check, but only in this specific case.

A note on fetching descendant nodes

The above sections only partly describe the current and proposed solutions. In both cases, a CPS path query is actually sent as two database queries:

...

Code Block

language	sql

WITH RECURSIVE descendant_search AS (
  SELECT id, 0 AS depth
    FROM fragment
   WHERE id IN (:fragmentIds)
   UNION
  SELECT child.id, depth + 1
    FROM fragment child INNER JOIN descendant_search ds ON child.parent_id = ds.id
   WHERE depth <= :maxDepth
) 
SELECT f.id, anchor_id AS anchorId, xpath, f.parent_id AS parentId, CAST(attributes AS TEXT) AS attributes 
FROM fragment f INNER JOIN descendant_search ds ON f.id = ds.id

Proof of Concept

A PoC was developed so that performance could be compared against existing Cps Path Query algorithms.

Test data

These tests were performed using the openroadm model, present in the integration-test module of CPS source code repository.

In this case, each 'device' node is comprised of 86 data nodes. In these tests, 4 anchors were populated using the same data.

Performance Improvement

Query one device from many, using descendant cps path

In this case, a query that matches a single device node is executed, such as:

...

(Note, I have not illustrated all different fetchDescendantOptions, as it has only minor impact in this case of fetching 1 device node.

Query all devices using descendant cps path

In this case, a query that matches many device nodes using a descendant cps path is executed:

...

Fetch descendants	Omit Descendants	Direct Descendants	All Descendants
Comparison graph
Graph detail
Time complexity of existing solution	O(N)	O(N²)	O(N²)
Time complexity of proposed solution	O(N)	O(N)	O(N)

...

The PoC algorithm grows linearly - O(n), e.g. doubling from 1000 to 2000 nodes takes 2 times longer (going from 3.5 to 7 seconds).

Query all devices using absolute cps path

In this case, a query that matches many device nodes using an absolute cps path is executed:

...

In above cases, it is observed that the existing algorithm grows quadratically - O(n²), while the PoC algorithm grows linearly - O(n).

Possible improvements of proposed solution

Other operations can be accelerated

The same algorithm to improve query performance could also be used to improve performance of other operations, such as GET, UPDATE, and DELETE data nodes. (In fact, the exact same code could be used.)

However, some of these existing operations have "plural" implementations taking Collections as parameters, such as getDataNodesForMultipleXpaths. Additional investigation is warranted.

Index-only lookup where leaf-condition is in the xpath

Note in the following CPS path query:

...

This would allow for an index-only lookup of the item, without needing the LIKE comparison or the attribute check.

Work Breakdown for Implementation

In addition to the changes outlined above, there is additional work remaining for this change to be production-ready.

The main algorithm was mostly done during the PoC (all integration tests are passing for the PoC). The existing PoC code can thus be refactored to make it production ready.

DB upgrade

Because a new column is being added to the Fragment table, this column needs to be populated. An SQL script will be needed to provide a value for of the new xpath_component field based on existing xpath field.

Cps Path Parser changes

CpsPathBuilder and CpsPathQuery classes from cps-path-parser module will need to be updated to provide the individual path components.

...

Space shortcuts

Page tree

Versions Compared

Old Version 21

New Version 22

Key

Table of Contents

Background

Path queries using path-component lookup

Objective

Complexity of existing solution

Implementation Proposal

Design decision: should xpath_component contain the leaf key?

A note on fetching descendant nodes

Proof of Concept

Test data

Performance Improvement

Query one device from many, using descendant cps path

Query all devices using descendant cps path

Direct Descendants

All Descendants

Query all devices using absolute cps path

Possible improvements of proposed solution

Other operations can be accelerated

Index-only lookup where leaf-condition is in the xpath

Work Breakdown for Implementation

DB upgrade

Cps Path Parser changes

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 21

New Version 22

Key

Table of Contents

Background

Path queries using path-component lookup

Objective

Complexity of existing solution

Implementation Proposal

Design decision: should xpath_component contain the leaf key?

A note on fetching descendant nodes

Proof of Concept

Test data

Performance Improvement

Query one device from many, using descendant cps path

Query all devices using descendant cps path

Direct Descendants

All Descendants

Query all devices using absolute cps path

Possible improvements of proposed solution

Other operations can be accelerated

Index-only lookup where leaf-condition is in the xpath

Work Breakdown for Implementation

DB upgrade

Cps Path Parser changes