Filter DynamoDB, using Query/Scan and implement a full-text search?

DynamoDB is the Amazon take on fully managed NoSQL database services that provide fast and predictable performance with seamless scalability. It is a great solution for projects that require single-digit millisecond response of read/write operations and easy integration with other AWS services like Lambda, ElasticSearch, etc.

In essence, DynamoDB provides a couple of ways to filter the returned results.

How to Filter in DynamoDB using Query Scan

When creating a Table in DynamoDB, you have to specify a Partition Key as a first parameter. It will be stored as a uniquely identifiable record in the database.

The second parameter can be the list of returned data. For this, we will be using the “Sort Key“. The name implies that it is going to be used for sorting purposes. Nonetheless, it can still be used as a unique filter for outgoing data in the DB.

For a third option, we can use the “GSI“, which stands for Global Secondary Index. You can create multiple “GSI”, and cover various usage patterns for data retrieval.

As an example, you can review the following structure in a column, created inside the AWS DynamoDB table “apartments_to_rent” named “rentor“.

{
“apartment_id: {N: 666},
"rentor": {
    "id": {N: 1234}
    "name": {S: "John Doe"}
    "rented_on": {N: 1234123421}
}
}

In this unique case, N and S stand as complex structured values. This is a special case where DynamoDB uses “marshall” and “unmarshall” data. In other words, DynamoDB keeps track of values stored.

If we review further, the letters N and S have their unique role:
N stands for Number, representing numeric values placed in the database.
S stands for String, representing any non-numeric value, such as name, location, etc.

Followed by another example, imagine, that an object is saved in plain JSON format.

"rentor": {
    "id": "1234"
    "name": "John Doe"
    "rented_on": "1234123421"
}

We can also change the “ID” of this nested object to a “GSI Partition Key“. All we need is to specify it when creating resources. For this example, we can use the “AWS CDK” code, listed below, to illustrate the example.

this.table.addGlobalSecondaryIndex({
    indexName: "index-rentor",
partitionKey: {name: "rentor.id", type: dynamodb.AttributeType.STRING},
});

The this.table is an instance of a “Table” class that creates and defines the DynamoDB table. By using this instance, we can execute the method addGlobalSecondaryIndex by passing the name contained in the index and binding it with the nested structure that we will use as a range key for queries. That’s all you need to define the “GSK”.

How to filter by a nested structure?

DynamoDB provides two main mechanisms to retrieve data in a filtered manner – query and scan.

The query method is the better performer compared to the scan method.

A reason for that lies in the way DynamoDB works under the hood. The “query” searches for the value by the passed Partition Key and only returns the results that match.

This technique gives the query operation a greater advantage in execution time and cost.

Scan operations on the other hand have a different use case. You can use them for smaller tables that have a reasonable amount of records

The scan gets the full data of the table and then starts to filter by criteria. This method can have an impact on budget and execution time, despite its advantages over query operations.

For example, a query operation allows you to do your search on a flat structure like a string, number, or boolean but not on an object.

However, the scan operation doesn’t have depth limits, and multiple object keys can be specified in any order without restraints. Moreover, scan operations allow using “contains” to make a full-text body search.

How can we implement search by GSK?

To implement a search by GSK, we can use the previously created example, and pass the required parameters.

const params: ScanInput = {
    TableName: "tableName",
    FilterExpression: "#rentor.#id = :rentorId",
    ExpressionAttributeNames: {
        "#rentor": "rentor",
        "#id": "id"
    },
    ExpressionAttributeValues: { 
        ":rentorId": "abcd-efgh-abcd-efgh" // UUID id from the database 
    }
};

try {
 const queryOutput = await DynamoDB().scan(paramsQuery).promise();
    return queryOutput.Items;
} catch (e) {
    throw e;
}

In this example, we declared a variable named params, and have constructed input parameters. This process starts with a table, which will be searched along with a filter selection.

The filter string is bound by the “ExpressionAttributeNames” and “ExpressionAttributeValues” properties. To further enhance the “ExpressionAttributeNames“, we have added two key-value pairs.

In “ExpressionAttributeValues” we have added the values that will be passed in our API request for retrieving data.

Once these requirements have been built, we can just pass them to the scan method of AWS DynamoDB and return the “Items” found.

Note, that the scan method is preferable but also comes with certain pros and cons. For example, for bigger tables, it is best to create separate columns in the database, with unique IDs and query the results. Scanning could prove less useful, and take more resources.

Posted in AWS.