The thought process behind picking the right database for our service with a reference checklist

Jul 18, 2024

Picture the below data persistence requirements:

The DB should be able to handle a minimum of 1.5 million RPS (Request Per Second) with 1 million writes per second and 500K reads per second.
Eventual data consistency is fine.
Low response latency (< 5 ms)
Is highly available and scales on demand

Given these requirements how would we pick the fitting database? How do we start our research? Do share with me your thoughts off the top of your head by replying to this email without running a search :)

Reading eventual consistency and horizontal scalability on demand immediately inclines us towards the NoSQL realm and our minds start to rev thinking of popular NoSQL databases. But before that, we should ideally talk about the type of data we are dealing with and the typical application queries.

Is the data relational in nature, hierarchical, graph-based, key-value, semi-structured or unstructured? This is key in picking the fitting database to store our data efficiently.

If the data is relational, then a relational database would fit best to store such type of data and we can further delve into techniques to make it more read-write performant along with fulfilling other requirements.

Storing data in a relational database does not necessarily mean queries will always be slow due to JOINS and such, and it cannot be scaled. It all depends. It depends on the DB design, application queries, nature of the data, traffic patterns and such.

Every database has its quirks and favors a certain type of data and read-write patterns. Being thorough with the type of data we are dealing with and the queries our service has to deal with helps us dramatically in narrowing down our search.
Also, this goes without saying that to run an effective technology search, we should be aware of the fundamentals of relational and non-relational databases, their strengths and weaknesses and such, including keeping tabs on the new persistence technologies that are being continually launched.

Taking our discussion further, let's say the data is non-relational, which takes us down the NoSQL road, offering us options such as DynamoDB, Redis, Cassandra, ScyllaDB, CockroachDB, MongoDB and more.

A good idea would be to go through the specifications of every database and check the benchmarks, the consistency levels they support, how they scale, throughput, replication strategies, how they manage distributed deployments, the support they offer, and so on.

Additionally, we need to scan the engineering blogs of large-scale services to understand how they are leveraging these products in their services.

Cloud databases fit best if we need managed solutions as opposed to self-hosting them. Serverless databases are a popular option in that realm. But at the same time, we are vendor-locked in with them. This is a trade-off.

Once we narrow down on a single or a few databases, the next step is to run a POC (Proof Of Concept). From there, a pilot deployment to a full-scale deployment.

Only a POC could give us concrete data with observed outcomes if a certain database fits our project requirements, in addition to providing an idea of the deployment costs.

POC (Proof Of Concept)

As opposed to picking a tech on how much it shines, we need to run a POC (Proof Of Concept) to verify if it fits our technical requirements. Running a POC enables us to measure performance metrics such as scalability, throughput and resource consumption in addition to gauging the running costs.

We can figure out potential bottlenecks with a certain technology pretty early in the process, ensuring the tech can handle the expected volume of traffic and transactions. This averts significant time and resource wastage when the same issue is encountered during full-scale implementation.

A POC gives us concrete data and observed outcomes with which we can make further informed decisions. So, for instance, during the process of picking the right persistence technology for our use case, we can:

Set up the new database in a test environment.
Migrate a subset of our current data to the new database.
Run typical queries and operations to benchmark performance.
Stress-test the tech to see how it handles high loads.
Collect feedback from our development and operations teams on ease of use, integration with the existing architecture and such.

Additionally, when picking a new tech, we need to conduct thorough research on the technology, including its documentation, whether it is actively developed, community support, product stability, present real-world deployments, associated licensing, maintenance and training costs, security assessment, etc.

Post POC

After a successful POC, we could implement the tech in a certain module of our project or a certain service on a small scale in production as opposed to going all in.

After monitoring the system behavior and performance we can further roll out the full-scale deployment. This approach minimizes the risk.

Picking the right database or the fitting technology (any tech, for that matter) entails an understanding of their nuances to avoid a situation where a certain shortcoming or a quirk throws a wrench into the gears in production.

There is no formula or straightforward answer to every use case. The only answer that fits best is it depends. It depends on the factors we discussed above. Also, there are several use cases where a single database fails to serve all our requirements and we proceed with polyglot persistence, with multiple databases working together.

I had a detailed discussion with a friend the other day on the same as he was researching the fitting persistence technology for his project. I thought I'd write about it. Furthermore, I've prepared a checklist that will serve as a reference for our future DB research.

Database research checklist

Performance requirements

Throughput
- Read throughput: The number of read operations the database must handle per second.
- Write throughput: The number of write operations the database must handle per second.
- Total Requests per Second (RPS): Combined read and write operations the DB must handle per second. This gives a comprehensive picture of the DB load.
Latency
- Maximum acceptable read latency
- Maximum acceptable write latency
- Maximum acceptable consistency latency: The time the data takes to be eventually consistent.

Scalability

Horizontal scalability
Vertical scalability
Elasticity: the ability to scale up/down automatically based on the load.

Availability and Reliability

High availability
- Fault tolerance and redundancy.
- Multi-region/multi-zone deployment.
Disaster recovery
- Backup and restore capabilities.
- Data replication and failover mechanisms.
Durability
- Ensuring data is not lost in the case of failures.

Data requirements

Data volume
- Current data size.
- Expected data growth over time.
Data model
- Structure: Relational/Non-relational (documents, key-value pair, graph, etc.)
Consistency
- Strong consistency
- Eventual consistency
- Tunable consistency

Check out my newsletter post for a detailed read on data consistency.

Operational Ease

Management complexity
- Ease of setup and configuration.
- Effective monitoring and maintenance tools available.
- Administrative overhead and required expertise.
Support and documentation
- Availability of professional support from the vendor.
- Quality of documentation and community support.
Ecosystem and integrations
- Compatibility with existing tools and applications.
- Availability of drivers and connectors for various programming languages and frameworks.

Security and Compliance

Security features
- Authentication and authorization mechanisms.
- Encryption (in transit and at rest).
- Auditing and logging capabilities.
Compliance
- Adherence to industry standards and regulations (GDPR, HIPAA, etc).

Cost considerations

Licensing costs
- Open-source vs. commercial licensing fees.
- Enterprise editions and their costs.
Operational costs
- Infrastructure costs (cloud vs. on-premises).
- Maintenance and administration costs.
Scalability costs
- Cost implications of scaling horizontally or vertically.
- Auto-scaling costs in cloud environments.

Documentation and Training

Training Resources
- Availability of tutorials, courses and certification programs.
- Access to a knowledgeable community or user group.
Case Studies
- Similar use cases to ours and success stories.

Performance and Testing

Benchmarking
- Conduct performance tests under load conditions.
- Compare against expected throughput and latency.
Proof of Concept (POC)
- Implement a small-scale version of your application.
- Test real-world performance, scalability, and reliability.

Future-Proofing

Roadmap and Updates
- Vendor’s commitment to future improvements and features.
- Community activity and upcoming releases.
Flexibility
- Ability to adapt to changing requirements.
- Ease of migration to other databases if needed.

I believe this checklist comprehensively covers the key points that we need to bear in mind when researching the right database for our project.

If you found this post insightful, do share the web link with your network for more reach. You can connect with me on LinkedIn & X, in addition to replying to this email.

If you wish to learn system architecture from the bare bones, check out the Zero to System Architecture Proficiency learning path I've authored that educates you, step by step, on the domain of software architecture, cloud infrastructure and distributed system design.

I'll see you around in my next post. Until then, Cheers!

Web Scale (Systems Architecture & Systems Programming)

Discussion about this post