Seth Geoghegan


Entity Relationships - Secondary Indexes vs Managing Manually

Imagine I have a one-to-many relationship between Users and Orders.  I have the following access patterns:

  1. Fetch Order by Order ID
  2. Fetch Orders by User ID

Users are modeled with PK of USER#<user_id>
Orders are modeled with PK of ORDER#<order_id>

I have two approaches I'd like to compare to implement the second access pattern.

Approach 1: Global Secondary Indexes

The GSI PK could be USER#<user_id> and GSI SK could be ORDER#<order_id>.

Approach 2: Manually Manage the Relationship without a GSI

In this approach, my application logic could insert an Order item into the User partition (PK = USER#<user_id> SK = ORDER#<order_id>) and insert an item to represent the Order by itself (PK = ORDER#<order_id>).  I would need to manually manage the relationship between Users and Orders, but I would not need to use a GSI.

This example is contrived, and is meant to illustrate my question.  There are times where I want to access the child in a parent-child relationship without going through the parent.  Secondary indexes feel like the "right" way to handle this, but the second approach seems viable as well.

Is one approach preferred over another?  Are there times when both approaches make sense?

Handling User Messages

Noticed a small inconsistency in Chapter 20 (Big Time Deals data modeling walkthrough) regarding modeling user messages.

The text says:

For the MessageId, we’ll stick with the KSUID that we used for Deals and discussed in Chapter 14. 
The code examples (create_message, mark_messages_read, etc) all use "created_at" as the SK, and screenshots shows timestamps in that column.  No biggie, but just stuck out :)

Fetching Most Recent Deals - Truncated Timestamp

I believe I've found a mistake in chapter 20.3.1 (the Deals modeling example).  The discussion of truncated timestamps says

You can truncate down to any granularity you like—hour, day, week, month, year—and your choice will depend on how frequently you’re writing items. 
  The issue is with using truncated timestamps to model weeks.  I believe you can only truncate down to the fields the timestamp represent; second, minute, hour, day, month, year (not weeks).  I came across this issue while trying to model something on a weekly boundary.  I'd love if I were mistaken, since it would solve my data modeling issue :)

Assembling Different Collections of Items

In Chapter 13 of the Dynamo DB Book, Alex DeBrie gives an example of an access pattern that relies on the arrangement of sort keys (section 13.2.2):

I understand this specific example may have been contrived to illustrate the pattern, but it's not clear to me when this pattern would be useful.  

In this example, if you were trying to get all ISSUES for a given REPO, why not query for items where the SK begins with "ISSUES#".  This access strategy is interesting, but I'm not clear about what it gives us what we don't already get with a "begins with" constraint on the SK.


Modeling Race Results

First time DynamoDB user checking-in for feedback about my first DynamoDB data modeling attempt!  The DynamoDB Book has been an amazing resource, I'm thankful it exists!

I am building an application that collects and presents running race results (e.g. 5k, 10k, marathon, etc).  This is how I'm modeling the Race, User and Result entities.

One of my basic access patterns is to fetch a sorted list of results by race.  I've implemented this access pattern by defining a secondary index on the RESULT entity, which includes a composite sort key.  The index GSI1PK is RACE#<race_id>, the GSI1SK is TIME#<time_in_seconds>.  This allows me to get sorted race results for a specific race by time:

This covers the basic needs of my application.  However, have additional access patterns that require fetching results by gender (top male and top female results per race) and age group results per race (14 years old and under, 15-19, 20-24, etc.). This is where I'm a bit stuck and could use some guidance.  My thoughts on a few options:

  1. Push this off to the client - It would be simpler if I made this a client-side concern.  If my data layer returns all the required information about a race result (age/gender/time) it would be trivial to create arbitrary groupings.  
  2. Filter existing results - each RESULT can include a user gender/age attribute, which I could then filter.  This would allow me to take advantage of my existing model without creating new indexes or other complicated trickery :)
  3. Create additional indexes - Taking advantage of gender being an enumeration (M or F) I could create a composite sort key that includes gender.  e.g. TIME#M#1200.  I could also treat age groups to enumerated values (e.g. 2024 for 20-24).  I could also combine the enumerations as well (e.g. 2024M and 2024M, etc).  However, this is starting to feel complex.
  4. Create a complex attribute on the RACE item that stores age group/gender lists/maps.  I don't have any access patterns on the values in age/gender groupings.  I'm not sure about this pattern for this use case, since a race has an unbounded list of results.  

  I'm new to modeling in DynamoDB and I feel like I'm wading in an ocean of possibilities!  Any advice or guidance would be greatly appreciated!