How Monzo and DoorDash integrate external libraries with their code and how do web-scale companies manage a large number of microservices
Monzo, a UK-based online bank, recently shared insights into how they run migrations across 2800 microservices. In this post, I'll quickly bring up a key aspect they discussed in the article: how they integrate their application code with external libraries. The approach helps them switch to new libraries without significant code refactoring, subsequently resulting in efficient migrations.
As opposed to directly integrating third-party code with their code which results in tightly coupled code, they create an abstraction layer and the abstraction layer, in turn, communicates with the external code, acting as an adapter.
With the abstraction layer, based on dynamic configuration, they can switch between multiple libraries without the need for significant code refactoring. During a library change, the only layer that might need refactoring is the abstraction layer.
This approach also enabled Monzo Engineering to instrument external libraries with their own telemetry libraries, in addition to rolling forward and back without the need to redeploy all services.
DoorDash follows a similar approach when working with cache libraries like Redis and Caffeine. I've discussed this in a former newsletter post. Do give it a read.
As a rule of thumb, we should avoid tight coupling external code with the application code and should always have an abstraction layer in between. This increases our code flexibility and reusability starkly.
Furthermore, how do web-scale companies manage such a significant number of microservices. I mean 2800 microservices clearly does not mean a dedicated team owning and managing every microservice. Uber has close to 4K microservices and Netflix over 1K.
Managing a large number of microservices
Automation and efficient microservices tooling are key drivers to managing and deploying such a large number of microservices at scale. For instance, Monzo has built mass deployment tooling that allows them to push out library changes across all services as an asynchronous batch job.
Microservices are tagged with criticality tiers and the least critical are deployed first, leveraging automated rollback checks using Argo Rollouts.
Small revertable changes are deployed first that are quicker to implement and validate with minimal blast radius if things go wrong.
Strict monitoring, with alerts configured in Grafana and Prometheus, and auditing checks are put in place. As an example, the deployment pipeline automatically blocks operations like deploying a version to production that has not been deployed to staging yet. Furthermore, a record of pipeline events is maintained to quickly figure out and rollback things if anything goes wrong.
A lot of config-controlled abstraction has been built into place, like I discussed before, to facilitate efficient interaction with the underlying technologies, which also enhances the development experience.
Emphasis is put on technology consistency like using similar technologies across microservices as much as possible to reduce friction when engineers move across services. This also enables them to reuse tooling written for a certain technology across microservices.
Similarly, every web-scale company leverages a set of tools to manage and scale their systems comprising a considerable number of microservices.
Service discovery and registry tools like Consul, Eureka, etc. enable microservices to discover each other dynamically and work in conjunction smoothly. Service mesh tools like Istio, Linkerd, etc, manage service-to-service communication, handling things like retries and circuit-breaking. Load balancers like Envoy, NGINX, etc., distribute traffic across service instances.
API gateways like Zuul manage access to backend services, also managing cross-cutting concerns like authentication, rate limiting, and caching. Orchestration tools like Kubernetes and Docker manage the container-based deployment and scaling of microservices.
Distributed tracing tools like Zipkin, Jaeger, etc., facilitate request tracking across the distributed microservices architecture. Metric and logging tools like Prometheus, Grafana and ELK stack help monitor system errors, performance and health.
I’ve discussed observability in detail here. Do check it out.
To manage configuration settings across thousands of microservice tools like HashiCorp Vault are leveraged. CI/CD tools like Jenkins, GitLab CI, etc., are leveraged to efficiently deploy these services.
Infrastructure as code tools like Terraform, Cloudformation, etc., facilitate programmatic infrastructure management. Netflix's Chaos Monkey randomly terminates instances in production to ensure the services are resilient to instance failures.
In summary, web-scale companies leverage a lot of intricate tooling and robust system architecture with automation and observability to manage and scale a significantly large number of microservices.
For further read do check out the Monzo microservices migration post here.
If you wish to go from zero to confidently contributing to system design discussions at your workplace, making informed decisions while having a firm grasp on the fundamentals, regardless of your current role and experience level, check out the Zero to Mastering System Architecture learning path that educates you step by step on web architecture, cloud computing, the infrastructure supporting scalable web services and distributed system design, starting right from zero.
CodeCrafters is a platform that helps you code distributed systems like Redis, Docker, Git, a DNS server, etc., step-by-step from the bare bones in the programming language of your choice. With their hands-on courses, you get an in-depth understanding of distributed systems and advanced system design concepts, which helps you become a more proficient engineer.
You can use my unique link to get 40% off (affiliate) if you decide to make a purchase. Moreover, I've written a few posts on coding distributed systems from the bare bones on my newsletter. You can check them out here.
If you found this post insightful, do share the web link with your network for more reach. You can connect with me on LinkedIn & X in addition to replying to this post.
I'll see you around in my next post. Until then, Cheers!