Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Repository queries taking collection parameters are limited to 32K collection items. (This is caused by Hibernate PostgreSQL driver having limit of 32767 bind variables per statement.)
For example, org.onap.cps.spi.repository.FragmentRepository#deleteByAnchorIngiven a FragmentRepository method:

Code Block
languagejava
    Collection<AnchorEntity> findAllByDataspaceAndNameIn(DataspaceEntity dataspaceEntityList<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> anchorNamesxpath);

 This This would throw an exception if supplied with a collection of 32,767 xpaths.

The reason is that JPA/Hibernate queries would translate the command to the following SQL:

Code Block
languagesql
SELECT * FROM fragment WHERE anchor_id = ? AND xpath IN (?,?,?,?...,?)

Each collection item (xpath) is supplied as an SQL query parameter. Hibernate with Postgres has a limit of 32,

...

767 query parameters per SQL statement, so an exception will be thrown with too many.

CPS-1574 is to remove those limits (and write tests to verify).

...

This solution partitions the input collection into lists of 32,000 and calls the original repository method, collecting the results:

Code Block
languagejava
    Collection<AnchorEntity> findAllByDataspaceAndNameIn(DataspaceEntity dataspaceEntityList<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> anchorNamesxpaths); 

    default Collection<AnchorEntity>List<FragmentEntity> findAllByDataspaceAndNameInBatchfindAllByAnchorIdAndXpathInBatch(final DataspaceEntityint dataspaceEntity,
                                                                     anchorId, final List<String> anchorNamesxpaths) {
         Collection<AnchorEntity>  List<FragmentEntity> results = new ArrayList<>(anchorNamesxpaths.size());
        for (final Collection<String> anchorNamesBatchxpathsBatch : Lists.partition(anchorNamesxpaths, 32000)) {
            results.addAll(findAllByDataspaceAndNameInfindAllByAnchorIdAndXpathIn(dataspaceEntityanchorId, anchorNamesBatchxpathsBatch));
        }
        return results;
    }

In addition, if we wish to avoid the overhead of adding the results to a collection when not needed, we may directly call the original when the collection size is less than 32,000:

Code Block
languagejava
    Collection<AnchorEntity> findAllByDataspaceAndNameIn(DataspaceEntity dataspaceEntityList<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> anchorNamesxpaths); 

    default Collection<AnchorEntity>List<FragmentEntity> findAllByDataspaceAndNameInBatchfindAllByAnchorIdAndXpathInBatch(final DataspaceEntityint dataspaceEntity,
                                                                     anchorId, final List<String> anchorNamesxpaths) {
        if (anchorNamesxpaths.size() < 32000) {
            return findAllByDataspaceAndNameInfindAllByAnchorIdAndXpathIn(dataspaceEntityanchorId, anchorNamesxpathsBatch);
        }
        Collection<AnchorEntity>   List<FragmentEntity> results = new ArrayList<>(anchorNamesxpaths.size());
        for (final Collection<String> anchorNamesBatchxpathsBatch : Lists.partition(anchorNamesxpaths, 32000)) {
            results.addAll(findAllByDataspaceAndNameInfindAllByAnchorIdAndXpathIn(dataspaceEntityanchorId, anchorNamesBatchxpathsBatch));
        }
        return results;
    }

Proposal #2:

...

Postgres arrays

This solution uses a third-party library to add PostGres array support to queries.

Comparison