The thought process behind picking the right database for our service with a reference checklist
Picture the below data persistence requirements:
The DB should be able to handle a minimum of 1.5 million RPS (Request Per Second) with 1 million writes per second and 500K reads per second.
Eventual data consistency is fine.
Low response latency (< 5 ms)
Is highly available and scales on demand
Given these requirements how would we pick the fitting database? How do we start our research? Do share with me your thoughts off the top of your head by replying to this email without running a search :)
Reading eventual consistency and horizontal scalability on demand immediately inclines us towards the NoSQL realm and our minds start to rev thinking of popular NoSQL databases. But before that, we should ideally talk about the type of data we are dealing with and the typical application queries.
Is the data relational in nature, hierarchical, graph-based, key-value, semi-structured or unstructured? This is key in picking the fitting database to store our data efficiently.
If the data is relational, then a relational database would fit best to store such type of data and we can further delve into techniques to make it more read-write performant along with fulfilling other requirements.
Storing data in a relational database does not necessarily mean queries will always be slow due to JOINS and such, and it cannot be scaled. It all depends. It depends on the DB design, application queries, nature of the data, traffic patterns and such.
Every database has its quirks and favors a certain type of data and read-write patterns. Being thorough with the type of data we are dealing with and the queries our service has to deal with helps us dramatically in narrowing down our search.
Also, this goes without saying that to run an effective technology search, we should be aware of the fundamentals of relational and non-relational databases, their strengths and weaknesses and such, including keeping tabs on the new persistence technologies that are being continually launched.
Taking our discussion further, let's say the data is non-relational, which takes us down the NoSQL road, offering us options such as DynamoDB, Redis, Cassandra, ScyllaDB, CockroachDB, MongoDB and more.
A good idea would be to go through the specifications of every database and check the benchmarks, the consistency levels they support, how they scale, throughput, replication strategies, how they manage distributed deployments, the support they offer, and so on.
Additionally, we need to scan the engineering blogs of large-scale services to understand how they are leveraging these products in their services.
Cloud databases fit best if we need managed solutions as opposed to self-hosting them. Serverless databases are a popular option in that realm. But at the same time, we are vendor-locked in with them. This is a trade-off.
Once we narrow down on a single or a few databases, the next step is to run a POC (Proof Of Concept). From there, a pilot deployment to a full-scale deployment.
Only a POC could give us concrete data with observed outcomes if a certain database fits our project requirements, in addition to providing an idea of the deployment costs.
POC (Proof Of Concept)
As opposed to picking a tech on how much it shines, we need to run a POC (Proof Of Concept) to verify if it fits our technical requirements. Running a POC enables us to measure performance metrics such as scalability, throughput and resource consumption in addition to gauging the running costs.
We can figure out potential bottlenecks with a certain technology pretty early in the process, ensuring the tech can handle the expected volume of traffic and transactions. This averts significant time and resource wastage when the same issue is encountered during full-scale implementation.
A POC gives us concrete data and observed outcomes with which we can make further informed decisions. So, for instance, during the process of picking the right persistence technology for our use case, we can:
Set up the new database in a test environment.
Migrate a subset of our current data to the new database.
Run typical queries and operations to benchmark performance.
Stress-test the tech to see how it handles high loads.
Collect feedback from our development and operations teams on ease of use, integration with the existing architecture and such.
Additionally, when picking a new tech, we need to conduct thorough research on the technology, including its documentation, whether it is actively developed, community support, product stability, present real-world deployments, associated licensing, maintenance and training costs, security assessment, etc.
Post POC
After a successful POC, we could implement the tech in a certain module of our project or a certain service on a small scale in production as opposed to going all in.
After monitoring the system behavior and performance we can further roll out the full-scale deployment. This approach minimizes the risk.
Picking the right database or the fitting technology (any tech, for that matter) entails an understanding of their nuances to avoid a situation where a certain shortcoming or a quirk throws a wrench into the gears in production.
There is no formula or straightforward answer to every use case. The only answer that fits best is it depends. It depends on the factors we discussed above. Also, there are several use cases where a single database fails to serve all our requirements and we proceed with polyglot persistence, with multiple databases working together.
I had a detailed discussion with a friend the other day on the same as he was researching the fitting persistence technology for his project. I thought I'd write about it. Furthermore, I've prepared a checklist that will serve as a reference for our future DB research.
Database research checklist
Performance requirements
Throughput
Read throughput: The number of read operations the database must handle per second.
Write throughput: The number of write operations the database must handle per second.
Total Requests per Second (RPS): Combined read and write operations the DB must handle per second. This gives a comprehensive picture of the DB load.
Latency
Maximum acceptable read latency
Maximum acceptable write latency
Maximum acceptable consistency latency: The time the data takes to be eventually consistent.
Scalability
Horizontal scalability
Vertical scalability
Elasticity: the ability to scale up/down automatically based on the load.
Availability and Reliability
High availability
Fault tolerance and redundancy.
Multi-region/multi-zone deployment.
Disaster recovery
Backup and restore capabilities.
Data replication and failover mechanisms.
Durability
Ensuring data is not lost in the case of failures.
Data requirements
Data volume
Current data size.
Expected data growth over time.
Data model
Structure: Relational/Non-relational (documents, key-value pair, graph, etc.)
Consistency
Strong consistency
Eventual consistency
Tunable consistency
Check out my newsletter post for a detailed read on data consistency.
Operational Ease
Management complexity
Ease of setup and configuration.
Effective monitoring and maintenance tools available.
Administrative overhead and required expertise.
Support and documentation
Availability of professional support from the vendor.
Quality of documentation and community support.
Ecosystem and integrations
Compatibility with existing tools and applications.
Availability of drivers and connectors for various programming languages and frameworks.
Security and Compliance
Security features
Authentication and authorization mechanisms.
Encryption (in transit and at rest).
Auditing and logging capabilities.
Compliance
Adherence to industry standards and regulations (GDPR, HIPAA, etc).
Cost considerations
Licensing costs
Open-source vs. commercial licensing fees.
Enterprise editions and their costs.
Operational costs
Infrastructure costs (cloud vs. on-premises).
Maintenance and administration costs.
Scalability costs
Cost implications of scaling horizontally or vertically.
Auto-scaling costs in cloud environments.
Documentation and Training
Training Resources
Availability of tutorials, courses and certification programs.
Access to a knowledgeable community or user group.
Case Studies
Similar use cases to ours and success stories.
Performance and Testing
Benchmarking
Conduct performance tests under load conditions.
Compare against expected throughput and latency.
Proof of Concept (POC)
Implement a small-scale version of your application.
Test real-world performance, scalability, and reliability.
Future-Proofing
Roadmap and Updates
Vendor’s commitment to future improvements and features.
Community activity and upcoming releases.
Flexibility
Ability to adapt to changing requirements.
Ease of migration to other databases if needed.
I believe this checklist comprehensively covers the key points that we need to bear in mind when researching the right database for our project.
If you found this post insightful, do share the web link with your network for more reach. You can connect with me on LinkedIn & X, in addition to replying to this email.
If you wish to learn system architecture from the bare bones, check out the Zero to System Architecture Proficiency learning path I've authored that educates you, step by step, on the domain of software architecture, cloud infrastructure and distributed system design.
I'll see you around in my next post. Until then, Cheers!