Flavors of NoSQL

Oct 4, 2024

—

A few weeks ago I talked about comparing a relational database versus a NoSQL database. I talked a bit about when to use one versus another (although each reason needs a topic all on its own). Today I want to talk about the different types of NoSQL.

One of the main advantages of using a different kind of store is scalability. It’s easier to scale horizontally (add more servers or nodes), and depending on how the nodes are set up, the system can become Highly Available with Partitioning, Data Replication and Disaster Recovery (all this is also available in a relational database, but it is more difficult to set up, requires more space or recovery is not as quick).

Now, on to the flavors…

Different Flavors

NoSQL basically translates “No Relational Database”; even NoSQL databases can have SQL-like syntax. What this means is that under the NoSQL umbrella, there are many ways of storing information and each one has a different way of organizing data, rules to follow and pros and cons.

Document Store

This is probably the most used NoSQL flavor. These store the information in different documents which usually are in JSON format, or a similar forma like BSON. Each document is self contained. This

This type of storage allows for flexibility when it comes to schema, basically any object can be serialized and stored. So it allows for hierarchical data to be easily stored.

I feel that document stores change the focus from how to store the data to how to use it. Since objects are simply stored, the system becomes more code centric. Complex object relations can be maintained fairly easily by just storing the object.

A big disadvantage is that queries and joins are more difficult and have to be done in code. Because of that, non-hierarchical relationships are slightly more difficult to maintain, since there’s no foreign key restraints. Also, running reports is a lot harder since the data has no strict schema and spread out over different nodes.

Key-Value Store

Just like a HashTable, here store the information with a unique Key and a Value. The value itself can be anything, a serialized object, binary data, encoded or encrypted data. This allows for a system to save different kinds of data in the same store, using the keys to know how to parse the data into code.

Since it’s Key-Value, the idea is that each value is not related to anything during storage, so no foreign keys, joins or hierarchy. This makes the store read values very quick, thus one of the most common uses is caching, or session management.

Same as with a Document Store, query and analysis will be harder because of how the information is spread out.

Column Store

This type of storage might seem like a relational database, but it’s different. Columns can be grouped into column families, and each row can have different columns, having a flexible but a more complex schema. These kinds of stores are used more for analytics, e-commerce orders or timed events, basically write-heavy workloads.

Since the information is structured it can perform complex queries, as long as the partition keys are properly set and sort keys are modeled correctly. Creating a proper sort key requires research into how the relationships are, and how to best order the Ids that the key will have.

We can add nodes to a column store fairly easy, making this solution horizontally scalable.

Graph Databases

Graph databases are very interesting. They are designed to show relationships between nodes. The obvious example is a social network where each user is represented by a node and are connected to each other, and if a user posts some content, this is also represented as a node.

Compared to a relational database, there would be a users table, then a follows table and a posts table. The same graph content can be stored here, but to follow the relationships a complex query with many joins is needed, and it wouldn’t be as performant.

Horizontal scalability takes a hit. Since this database has a direct relationship between nodes, we can partition the graph into smaller pieces, but now we have to access data potentially related from different nodes.

Conclusion

Just like any other tech, each type of store came from a necessity. The same things can be accomplished with any of these flavors, it’s just that some are better at specific workloads than others, so knowing overall what pros and cons of each will help a lot.

You can try doing things on a key-value, but then realize that it’s better to use a column store because you need better querying, and that’s ok. Hopefully your system is nimble and can adjust accordingly by abstracting everything (future post).