You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Background

The current implementation of Cps Path queries relies on regular expressions in the generated SQL queries.

For example, given this Cps Path:

/bookstore/categories[@code="1"]/books[@title="Matilda"]

the following SQL query is generated:

SELECT
    * 
FROM
    FRAGMENT 
WHERE
    anchor_id = 3
    AND xpath ~ '/bookstore/categories\[@code=''1'']/books(\[@(?!.*\[).*?])?$' 
    AND (
        attributes @> '{"title":"Matilda"}'
    )

The use of regex potentially results in full table scans, severely limiting performance, and causing query time duration to grow linearly with the database size (i.e. queries get slower as the DB gets bigger).

A new algorithm for queries is being proposed, avoiding regex, so that query duration is independent of database size.

Proposal

The new approach involves adding a column to the Fragment table, storing the xpath component.

TODO description of new algorithm

For example, given this Cps Path:

/bookstore/categories[@code="1"]/books[@title="Matilda"]

the following SQL query is generated:

SELECT
    * 
FROM
    fragment 
WHERE
    parent_id IN (
        SELECT
            id 
        FROM
            fragment 
        WHERE
            xpath_component = 'categories[@code=''1'']'
            AND parent_id = (
                SELECT
                    id 
                FROM
                    fragment 
                WHERE
                    xpath_component = 'bookstore'
                    AND anchor_id = 3
                    AND parent_id IS NULL
            )
        ) 
        AND (
            xpath_component = 'books'
            OR xpath_component LIKE 'books[%'
        ) 
        AND (
            attributes @> '{"title":"Matilda"}'
        )

Proof of Concept

A PoC was developed so that performance could be compared against existing Cps Path Query algorithms.

Performance Improvement

As can be seen in the cases below, the existing algorithm using regex has linear time complexity, on the the size of the database. The new algorithm is constant time.

  • For querying 1 out of many nodes, existing algorithm is linear
  • For querying 1 out of many nodes, existing algorithm is constant
  • For querying many out of many nodes, existing algorithm is quadratic
  • For querying many out of many nodes, new algorithm is linear





Time (ms) for X devices
CaseScenarioQueryDescendantsBranch505001000
1aQuery 1 device from many, descendant cps path
//openroadm-device[@device-id="C201-7-1A-14"]
nonemaster645391,451
1b

nonepoc332865
2a
//openroadm-device[@device-id="C201-7-1A-25"]
directmaster875831,580
2b

directpoc343570
3a
//openroadm-device[@device-id="C201-7-1A-36"]
allmaster926191,693
3b

allpoc514367
4aQuery 1 device from many, absolute cps path
/openroadm-devices/openroadm-device[@device-id="C201-7-1A-19"]
nonemaster978812,383
4b

nonepoc281414
5a
/openroadm-devices/openroadm-device[@device-id="C201-7-1A-32"]
directmaster1199002,403
5b

directpoc311716
6a
/openroadm-devices/openroadm-device[@device-id="C201-7-1A-46"]
allmaster1279122,471
6b

allpoc442728
7aQuery many devices from many, descendant cps path
//openroadm-device[@ne-state="inservice"]
nonemaster726061,547
7b

nonepoc40167737
8a
//openroadm-device[@ne-state="inservice"]
directmaster56011,55241,652
8b

directpoc1426201,428
9a
//openroadm-device[@ne-state="inservice"]
allmaster57712,02743,321
9b

allpoc2271,4503,502
10aQuery many devices from many, absolute cps path
/openroadm-devices/openroadm-device[@status="success"]
nonemaster998921,808
10b

nonepoc2944107
11a
/openroadm-devices/openroadm-device[@status="success"]
directmaster40411,90042,422
11b

directpoc68325797
12a
/openroadm-devices/openroadm-device[@status="success"]
allmaster48211,93742,581
12b

allpoc2381,2983,048
  • No labels