How to filter DynamoDB records effectively using Query or Scan and implement a full-text search?

DynamoDB is the Amazon take on fully managed NoSQL database services that provide fast and predictable performance with seamless scalability. It is a great solution for projects that require single-digit millisecond response of read/write operations and easy integration with other AWS services like Lambda, ElasticSearch, etc.

By essence DynamoDB provides a couple of ways to filter the returned results. First let’s look at the properties that can be provided as a criteria.

On creation of a Table in DynamoDB you have to specify a Partition Key by which you will have a uniquely identifiable record in the database.

The secord parameter by which you can further refine the list of returned data is the Sort key. The name implies that it is used for sorting purposes but in fact it can be also used as an uniquefilter for the outgoing data from the DB.

The third option that DynamoDB provides to its users is the GSI (global secondary index). You can create as many global secondary indexes as you need and this is a great way to cover various usage patterns for data retrieval. The GSIs also provide the capability to specify nested structures as the values. So let’s say we have the following structure in a column of the DynamoDB table “apartments_to_rent” named “rentor”:

{
“apartment_id: {N: 666},
"rentor": {
    "id": {N: 1234}
    "name": {S: "John Doe"}
    "rented_on": {N: 1234123421}
}
}

If you are curious to understand what N and S stand for in the complex structured values – it’s just DynamoDB’s way to “marshall” and “unmarshall” data – in other words – to keey track of the type of the value stored and not just the value itself. N stands for “Number” and S for “String”. Let’s forget about that for the purpose of this example, and imagine that the object is saved as a plain JSON.

"rentor": {
    "id": "1234"
    "name": "John Doe"
    "rented_on": "1234123421"
}

To make the “id” of this nested object a GSI Partition Key, we have to specify it when we create our resources. We will use AWS CDK code below to illustrate the example. It will look something like this:

this.table.addGlobalSecondaryIndex({
    indexName: "index-rentor",
partitionKey: {name: "rentor.id", type: dynamodb.AttributeType.STRING},
});

The this.table is an instance of an “Table” class that creates and defines the DynamoDB table. Using this instance we execute the method addGlobalSecondaryIndex by passing the name that the index will have and binding it with the nested structure that we will use as a range key, to be used later for queries.

That’s all you need to define the gsk.

Now for the interesting part! How do we filter by such a nested structure?

DynamoDB provides two main mechanisms to retrieve data in a filtered manner – those methods are query and scan. They both have their use cases, and I will explain which one to use for what now.

The query method is the better performer compared to the scan method. The reason for that lies in the way DynamoDB works under the hood. Query is searching for the value by the passed Partition Key and is only returning the results that match. This technique gives the query operation a greater advantage in execution time and cost for the pocket of the consumer.

You can use the scan operation for smaller tables that have a reasonable amount of records. This statement should be taken really in mind, because scan gets the full data of the table first and then starts to filter by some criteria. This will for sure impact the budget of the client and the execution time. Despite that scan has some advantages compared to query and for example one of them is nested querying of data. Yes! Query doesn’t allow it. Query allows you to do your search on a flat structure like a string, number, boolean but not on an object. Scan on the other hand doesn’t even have a limit on the depth of that object; you just have to specify all of the object keys in the valid order. On top of that scan allows you to use the “contains” keyword to make a full text body search, neat!

So sorry, but query is out of the question for this topic.

How do we implement the search by this gsk that we’ve created previously. Lets explain the params that have to be passed and how the method works. Example request:

const params: ScanInput = {
    TableName: "tableName",
    FilterExpression: "#rentor.#id = :rentorId",
    ExpressionAttributeNames: {
        "#rentor": "rentor",
        "#id": "id"
    },
    ExpressionAttributeValues: { 
        ":rentorId": "abcd-efgh-abcd-efgh" // UUID id from the database 
    }
};

try {
 const queryOutput = await DynamoDB().scan(paramsQuery).promise();
    return queryOutput.Items;
} catch (e) {
    throw e;
}

So what just happened? First we declared a variable named params in which we constructed our input parameters. We started off with the table that we will be searching, after that we created our filter selection. The filter string is binded in the “ExpressionAttributeNames” and “ExpressionAttributeValues” properties. As you can see in “ExpressionAttributeNames” we have added two key value pairs, first is the column name and then the nested “id” of the object. The “ExpressionAttributeValues” are the values we have passed in our API request for the data.

After we have built the requirements for the filtration we just need to pass them to the scan method of DynamoDB and return the “Items” it has found.

This method comes with its pro’s and con’s. Yes! we can filter by nested object attributes but we have to be careful of the table size we want to filter. On bigger tables it will be a better solution to create a separate column in the db with the id and query the results and not do the scan technique we just covered.

Hope this helps you on your quest to greatness!

Posted in AWS.