Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Problem Overview

The REST API of the CRUD webservice A&AI Resources is not consistent under concurrent access to the same entity (calls using the same ID).

If multiple requests modify the same entity the result is undetermined. The worst case behaviour observed is when concurrent access produces duplicates of that entity - the result is that no further REST queries (GET/POST/DELETE ...) to this entity are possible and the entity become inaccessible. Afterwards only a manual intervention in the database can restore proper service behaviour. Access the data concurrently at your own risk.

It is recommended that at all times there is only 1 client calling the A&AI REST services.

Technical aspects of the problem

It is unclear what causes this problem. The most probable cause is that the code calling JanusGraph library is buggy. In order to guarantee data consistency special precautions have to made described here Chapter 31. Eventually-Consistent Storage Backends. Unfortunately the JanusGraph features that ensure data consistency are not used in the current A&AI code.

Path forward

There are 4 solutions possible (subjectively sorted from worst to best)

  1. Recommend using max. 1 A&AI REST client.
    • PROS: the current state
    • CONS: not what would be expected of a carrier-grade system by any rational measure
  2. Having batch jobs scan the database and remove duplicates and malformed data created due to inconsistencies (this is being used now as problem mitigation)
    • PROS: you don't have to deal with the real problem, it is easy because you only target symptoms
    • CONS: does not work because if data gets corrupted between batch sweeps then the data is unavailable until the batch job runs again. Also adds accidental complexity with batch jobs and their timing.