Background
The current implementation of Cps Path queries relies on regular expressions in the generated SQL queries.
For example, given this Cps Path:
/bookstore/categories[@code="1"]/books[@title="Matilda"]
the following SQL query is generated:
SELECT * FROM FRAGMENT WHERE anchor_id = 3 AND xpath ~ '/bookstore/categories\[@code=''1'']/books(\[@(?!.*\[).*?])?$' AND ( attributes @> '{"title":"Matilda"}' )
The use of regex potentially results in full table scans, severely limiting performance, and causing query time duration to grow linearly with the database size (i.e. queries get slower as the DB gets bigger).
A new algorithm for queries is being proposed, avoiding regex, so that query duration is independent of database size.
Proposal
The new approach involves adding a column to the Fragment table, storing the xpath component.
TODO description of new algorithm
For example, given this Cps Path:
/bookstore/categories[@code="1"]/books[@title="Matilda"]
the following SQL query is generated:
SELECT * FROM fragment WHERE parent_id IN ( SELECT id FROM fragment WHERE xpath_component = 'categories[@code=''1'']' AND parent_id = ( SELECT id FROM fragment WHERE xpath_component = 'bookstore' AND anchor_id = 3 AND parent_id IS NULL ) ) AND ( xpath_component = 'books' OR xpath_component LIKE 'books[%' ) AND ( attributes @> '{"title":"Matilda"}' )
Proof of Concept
A PoC was developed so that performance could be compared against existing Cps Path Query algorithms.
Performance Improvement
As can be seen in the cases below, the existing algorithm using regex has linear time complexity, on the the size of the database. The new algorithm is constant time.
- For querying 1 out of many nodes, existing algorithm is linear
- For querying 1 out of many nodes, existing algorithm is constant
- For querying many out of many nodes, existing algorithm is quadratic
- For querying many out of many nodes, new algorithm is linear
Time (ms) for X devices | |||||||
Case | Scenario | Query | Descendants | Branch | 50 | 500 | 1000 |
---|---|---|---|---|---|---|---|
1a | Query 1 device from many, descendant cps path | //openroadm-device[@device-id="C201-7-1A-14"] | none | master | 64 | 539 | 1,451 |
1b | none | poc | 33 | 28 | 65 | ||
2a | //openroadm-device[@device-id="C201-7-1A-25"] | direct | master | 87 | 583 | 1,580 | |
2b | direct | poc | 34 | 35 | 70 | ||
3a | //openroadm-device[@device-id="C201-7-1A-36"] | all | master | 92 | 619 | 1,693 | |
3b | all | poc | 51 | 43 | 67 | ||
4a | Query 1 device from many, absolute cps path | /openroadm-devices/openroadm-device[@device-id="C201-7-1A-19"] | none | master | 97 | 881 | 2,383 |
4b | none | poc | 28 | 14 | 14 | ||
5a | /openroadm-devices/openroadm-device[@device-id="C201-7-1A-32"] | direct | master | 119 | 900 | 2,403 | |
5b | direct | poc | 31 | 17 | 16 | ||
6a | /openroadm-devices/openroadm-device[@device-id="C201-7-1A-46"] | all | master | 127 | 912 | 2,471 | |
6b | all | poc | 44 | 27 | 28 | ||
7a | Query many devices from many, descendant cps path | //openroadm-device[@ne-state="inservice"] | none | master | 72 | 606 | 1,547 |
7b | none | poc | 40 | 167 | 737 | ||
8a | //openroadm-device[@ne-state="inservice"] | direct | master | 560 | 11,552 | 41,652 | |
8b | direct | poc | 142 | 620 | 1,428 | ||
9a | //openroadm-device[@ne-state="inservice"] | all | master | 577 | 12,027 | 43,321 | |
9b | all | poc | 227 | 1,450 | 3,502 | ||
10a | Query many devices from many, absolute cps path | /openroadm-devices/openroadm-device[@status="success"] | none | master | 99 | 892 | 1,808 |
10b | none | poc | 29 | 44 | 107 | ||
11a | /openroadm-devices/openroadm-device[@status="success"] | direct | master | 404 | 11,900 | 42,422 | |
11b | direct | poc | 68 | 325 | 797 | ||
12a | /openroadm-devices/openroadm-device[@status="success"] | all | master | 482 | 11,937 | 42,581 | |
12b | all | poc | 238 | 1,298 | 3,048 |