-1

I have a DynamoDB table with a PK id and attributes status (boolean) and type (string with two values). I need to implement pagination for results filtering on status and type. My current approach uses scan with FilterExpression and setting Limit to 2 * desiredResultSize. I accumulate filtered results until reaching desiredResultSize, then generate a nextToken for the client to use in the next request based on the last filtered item. However, I'm facing a few potential issues:

  1. The Scan+filter might not find any qualified items.
  2. Unsure if the last found qualified item is truly the last one in the table. Ideally if the there's not qualified items in the remaining table, I should return null as nextToken to the customer.
  3. There will be many scan operations, could be slow. My table size would be ~2k then.
    Here is my current implementation
public PaginatedResult<List<ItemRecord>> getFilteredItems(String nextToken, int resultSize) {
    Map<String, AttributeValue> exclusiveStartKey = decodeNextToken(nextToken);

    Expression filterExpression = Expression.builder()
            .expression("status = :status AND type IN (:type1, :type2)")
            .expressionValues(Collections.unmodifiableMap(new HashMap<String, AttributeValue>() {{
                put(":status", AttributeValue.fromBool(true));
                put(":type1", AttributeValue.fromString("typeValue1"));
                put(":type2", AttributeValue.fromString("typeValue2"));
            }}))
            .build();

    ScanEnhancedRequest.Builder scanEnhancedRequestBuilder = ScanEnhancedRequest.builder()
            .filterExpression(filterExpression)
            .limit(resultSize);

    // Accumulating results until desired resultSize is reached
    // Create nextToken based on the last item in filteredItemsList
    // Return the lastEvaluatedKey of the last found qualified item as nextToken
    List<ItemRecord> filteredItemsList = new ArrayList<>();
    Map<String, AttributeValue> currentExclusiveStartKey = exclusiveStartKey;
    boolean findAllQualifiedItems = false;
    while (!findAllQualifiedItems && filteredItemsList.size() < resultSize) {
        try {
            ScanEnhancedRequest scanEnhancedRequest = scanEnhancedRequestBuilder
                    .exclusiveStartKey(currentExclusiveStartKey).build();
            Page<ItemRecord> itemRecordsList = table.scan(scanEnhancedRequest)
                    .stream().findFirst().orElse(null);

            if (itemRecordsList.items().isEmpty()) {
                break;
            }

            for (ItemRecord itemRecord : itemRecordsList.items()) {
                if (filteredItemsList.size() < resultSize) {
                    filteredItemsList.add(itemRecord);
                }
            }

            currentExclusiveStartKey = itemRecordsList.lastEvaluatedKey();
            if (currentExclusiveStartKey == null) {
                // We have found all qualified items in the table
                findAllQualifiedItems = true;
            }
        } catch (Exception e) {
            // handle it
        }
    }

    if (filteredItemsList.isEmpty()) {
        if (nextToken == null) {
            throw new NotFoundException("Qualified items not found");
        }
        return new PaginatedResult<>(filteredItemsList, Optional.ofNullable(null), resultSize);
    }

    Map<String, AttributeValue> lastEvaluatedKey;
    if (findAllQualifiedItems) {
        lastEvaluatedKey = null;
    } else {
        lastEvaluatedKey = ConversionHelper.itemToAttributeValueMap(
                filteredItemsList.get(filteredItemsList.size() - 1));
    }

    String newNextToken = ConversionHelper.encodeNextToken(lastEvaluatedKey);
    return new PaginatedResult<>(filteredItemsList, Optional.ofNullable(newNextToken), resultSize);
}

Thank you for any suggestion!

1 Answer 1

0

Your application should ensure you gain enough items to return to the user, the last item you return will be the LastEvaluatedKey.

You'll never know if there are more items to be fetched or not while using a Scan.

I would suggest changing your data model to use an index and Query instead of Scan, it's much more predictable, not to mention more performant and cost effective.

Not the answer you're looking for? Browse other questions tagged or ask your own question.