This is a space to discuss The DynamoDB Book.

If you have a question about a particular strategy, example, or other section of The DynamoDB Book, post it in here.

If you notice typos, errors, or confusing issues, post those here as well!

Note that moderators may occasionally remove posts from this space if they reflect errors that have been fixed in The DynamoDB Book. 
Hi Alex,

In chapter 19.3.4. Modelling the Order Items, for the table the PK and SK are model for OrderItems as:
 
OrderItems PK: ORDER#<OrderId>#ITEM#<ItemId>  SK: ORDER#<OrderId>#ITEM#<ItemId>  

I was wondering why you modelled them this way and why not say

OrderItems PK: ORDER#<OrderId>  SK: ITEM#<ItemId>  

or

OrderItems PK: ITEM#<ItemId>  SK: ITEM#<ItemId>  


Kind Regards

Rob
Good question, Rob.  Obviously I can't speak for Alex DeBrie but it seems like for this demo example the schema is assuming that OrderItems don't come from anywhere, i.e. they only exist on an order.
Hey Rob, good question. The main point I was trying to show there was that the one-to-many pattern of 'Fetch Order and Order Items' would be handled in the secondary index. Thus, the primary key pattern in the base table was less important.

The second pattern you showed wouldn't have worked because it wouldn't have had enough uniqueness. In this model, the ItemId is an identifier for that particular item. If the PK & SK were both ITEM#<ItemId>, then the item would have been overwritten whenever *anyone* purchased that item in an order.

The first pattern (OrderId as PK, ItemId as SK) would have worked. That said, I generally avoid putting items in the same item collection unless I'm going to be fetching them as part of a Query. By spreading them and putting them into their own item collection based on the combination of OrderId and ItemId, you give DynamoDB the ability to really spread those out. It probably doesn't matter too much here, but just a practice I try to do.
Hey Alex DeBrie,

I'm just working my way through the Big Time Deals example, and on page 362 in the `create_message` method you assign the message's `Unread` attribute a value of a string "True".

Is there a reason for using string here, rather than DDB's native BOOL type? 
Also, in this use-case you switch from originally using `MESSAGES#<Username>` for the PK, to `MESSAGE#{message.username}` for the PK in the code samples - was that (dropping the `S` off `MESSAGES`) intentional, or just a typo?
Nope, no particularly reason on the string vs. bool. I rarely think of the bool type, and I'll need to convert it from a string value ('true') no matter which DynamoDB type it is.

And nice catch on the MESSAGE/MESSAGES part! That's an error on my end. Will update :)
In 13.2.2. Assembling different collections of items, the Github example describes getting a repo and all its issues, and similarly a repo and all its stars. It seems like the two filter patterns rely on the lexical order of the entries. Scanning forward gives the first access pattern and scanning backwards gives the other.

I wanted to confirm that this pattern depends on the fact that there are only three types in the SK: ISSUE/REPO/STAR, and that REPO just happens to be the in the middle. e.g. if you added FORK/WATCHER it would break this pattern.

Is this a practical example or just something that happens to work every now and then?
I’m having the toughest time squaring what the Dynamo team say with what the Amplify team say. Seems AWS have internal conflicts on one vs many table design. One table makes most sense to me but Amplify has been seeming built to go against this design and continues adding more features that kinda lock you into many table design. Amplify seems to be taking off and powering apps behind it with Dynamo so I’m guessing will force multi design more just by sheer deployment numbers. 

Has anyone got any feedback on this? Alex? I don’t want to build a huge app on the wrong architecture that even Amazon can’t seem to agree on. 
Good question, Martin, and it's one I keep seeing. Clearly someone needs to write the definitive post on it :)

I've opined briefly here. Basically, I think GraphQL & AppSync are optimizing for different things: frontend developer happiness and ease of backend code. As part of that, they'll accepting some inefficiencies in database access by having a single request make multiple hits to the database.

It's hard to say what the right approach is. I've gotten to where I say it's fine to use multiple tables with GraphQL & AppSync as long as you know and accept the tradeoffs.

There have been a few people (Rich Buggy) that have discussed using single-table design with GraphQL. And that's doable too! At that point, you're making a different tradeoff: more backend complexity in exchange for fewer database hits.
Martin you're not wrong. Amplify is nice but it seems like it is abstracting too much away. The documentation around the REST api with a DDB backend is severely lacking and seems very much about having a single hash key and distinct values. In fact if you use a hash and range key, it's not very clear what the url pattern should be to fetch a single item: /path/object/hash/range

I am working through a react/redux course right now so it is what it is, but I think when I'm done with this I'm going to work in a more lightweight version of amplify to serve my own sadistic desires to put my data in a single table.

If anyone is interested I'd be happy to post back here if anyone would like to help.
Thanks for posting, I hadn't read that page before.  Their suggestion to prefix the sort key with a version seems similar to suffixing the sort key with a date/version, as Alex suggested.

I don't think I like the prefix pattern because it seems more complicated, especially when adding a new item.  It looks like finding the next version number would mean using a queryfilter on all versions, and that gets messy if there are a lot of items or multiple versioned sort key patterns.
Alex - looks like you are using the image export from the Workbench for the book.  Do you have a full Data Model for each of the chapters that you could also put in with the code so we could import them into our own workbenches?

On a related point, as it is still relatively new and still evolving, have you come across any Workbench Best Practices for modelling?  Perhaps it might justify an extra supplement to the book.  I would be interested in:

  • how much data should you have in there
  • are facets actually access patterns or item collections. 
  • is there a good way to share the DB.json file with other developers (Git/Shared storage)
  • Tips for editing the DB.json in an editor (is there a VS Code plugin?)
  • Is the Operation Builder of any use and what about the code-gen?

Thanks.

O.
 
Owain, back in December I created a github repo for DynamoDB Single Table Models (https://github.com/deploystudios/dynamodb-single-table-models). I was on my post-reinvent/houlihan buzz and then holidays, life, etc.

The idea here was to create models that could be imported into NoSQL Workbench. Would like to extend this to include Dynobase as well. I figured it would be great to show different models for things like a CMS, IoT data, etc.

Alex DeBrie forgot I gave you a shout out on that page as well! Let me know if you'd like to use this.


Alex DeBrie replied
  ·  1 reply
Good thought, Owain . Let me see if I can clean these up and make them available.

My models are a bit spread out, to be honest. I often only want to show a subset of the data (e.g. Repos and Issues, or Deals and Categories), so I end up having ~10 different 'tables' in each model that I have. It made it easier to snapshot particular portions for the book.
In chapter 20 you are using DynamoDB Streams to achieve some business requirement. With single table design this means that lambda that is triggered will obtain a lot of traffic that is not interesting for it and will just do some action on small percentage of events. What is your take on that?

Is it worth it to use DynamoDB streams with tables with heterogeneous items or maybe it is better to create separate table when we need stream for some reason?
Good question, and filtered streams are on my DynamoDB Wish List :).

That said, it turns out to be a pretty small detail in implementation to the point that I don't think we'll get it from the DDB team soon. In your stream handler, you can filter for the particular type of item you want and operate on it as needed. For events that don't match, just drop them and move on.

The biggest downside is that it can bundle a bunch of logic into one function if you are operating on a few different types of events. However, you usually just put the actual logic in separate modules and then have a top-level router that filters and places the events as needed.
I have a generic lambda that simply reads the stream, filters stuff out you are not interested in and then publishes the event onto a SNS topic (or Event Bridge) with some messageAttributes so you can set a lambda filterPolicy for subsequent Lambdas to listen in for.  A pub/sub pattern.

I parse the PK for the entityType (single table model and use the # as the delimiter) and set this as the messageType.  

messageType in the format: com.example.<entity_type>.<eventName>

I prefer to use POST, PUT and DELETE translated from eventName  ( INSERT | MODIFY | REMOVE ) but that's just me.

I also set a messageId from the eventId so we have a form of correlation id for logs.

The messageType is done in a reverse format so that it is easy to use a prefix on the filterPolicy (e.g. listen for all USER events).  

I also have other components in most of my PKs so I also set other message attributes which the rest of the architecture (multi-tenanted) uses.  e.g. you can  filter the subscription to pick up "all updates to users for a particular customer" .  
I do this since I use Appsync which directly does the DB write without setting some of the headers I need.  Hence they are encoded in the key.


If you're using EventBridge, it can subscribe to a dynamo stream and only invoke your lambda(s) with matching events.
As I was reading chapter 14th I realized that some of the strategies are very complicated. I have sub-chapter 14.4 in mind where you have two relational access patterns within single item collection.

This assumed that your SK can have three different prefixes and you can choose 2 of them by just sorting items.

This seems very clever but I wanted to ask you how frequently do you use this approach? I can imagine that in the application with ongoing development you can break such thing just by introducing new prefix in the SK. What is your approach here? Do you just write a lot of tests to cover all of your access patterns?

When I read about it - I thought - wow - this is nice, but then I thought... Do I want to maintain such design? I am not so sure.
Great question, Jędrzej. The short answer is to use what feels comfortable to you.

I use the pattern of two relational access patterns in a single item collection without hesitation, because it doesn't feel that complicated to me. The combination of a parent item and its related items in one item collection feels natural. And because I'm used to do it in one direction, I usually don't hesitate to use it in the other direction.

The maintenance cost feels low because it's usually a 'set it and forget it' situation. After you model out the entities and their access patterns, you write the logic one time to add the indexing attributes and the access patterns, then rarely touch it again. If you do add new entities, you need to make sure not to step on existing item collections, but you can usually see this from the entity chart.

That said, I know it is a burden. For me, I probably wouldn't use the strategy in Section 14.6 (Faking ascending order) very often. I find that one quite confusing. I wanted to show it as an example of what's possible, but it crosses the legibility barrier for me.

If two relationships in one item collection seems too much for you, then I'd advise to add an additional index. It may cost a bit more, but it could save you some time in future modeling.

Curious to hear what others think as well :) 
Jędrzej Szczepaniak replied
  ·  1 reply