Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A new algorithm for queries is being proposed, avoiding regex, so that query duration is independent of database size.

Objective

To The objective is to achieve the maximum theoretical performance for queries.

...

  • For a query that will return 1 out of N items, the current time complexity is linear, O(N).
  • For a query that will return all N items, the current time complexity is quadratic, O(N2).


OperationExisting solutionProposed solution
Query 1 out of NO(N)O(1)
Query all out of NO(N2)O(N)

Proposal

The new approach involves adding a column to the Fragment table, storing the last path component (called xpath_component here). The new column is indexed, to allow constant-time lookups.

...

In above cases, it is observed that the existing algorithm grows quadratically - O(n2), while the PoC algorithm grows linearly - O(n).

Summary of performance

As can be seen in the cases below, the existing algorithm using regex has linear time complexity based on the the size of the fragment table. The new algorithm is constant time.

...

)

...

.

Work Breakdown

In addition to the changes outlined above, there is additional work remaining for this change to be production-ready. 

...