<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Web Scale (Systems Architecture & Systems Programming)]]></title><description><![CDATA[Insightful discussions on web-scale distributed systems and backend engineering in general.]]></description><link>https://shivangsnewsletter.com</link><image><url>https://substackcdn.com/image/fetch/$s_!8Xg8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d35d7e-83e0-445f-99b4-7a2f9f9c37b5_500x500.png</url><title>Web Scale (Systems Architecture &amp; Systems Programming)</title><link>https://shivangsnewsletter.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 06 Apr 2026 09:07:47 GMT</lastBuildDate><atom:link href="https://shivangsnewsletter.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Shivang Sarawagi]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[shivangsarawagi@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[shivangsarawagi@substack.com]]></itunes:email><itunes:name><![CDATA[Shivang Sarawagi]]></itunes:name></itunes:owner><itunes:author><![CDATA[Shivang Sarawagi]]></itunes:author><googleplay:owner><![CDATA[shivangsarawagi@substack.com]]></googleplay:owner><googleplay:email><![CDATA[shivangsarawagi@substack.com]]></googleplay:email><googleplay:author><![CDATA[Shivang Sarawagi]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How Unikraft Cloud reduces serverless cold starts to milliseconds with unikernels and microVMs]]></title><description><![CDATA[In my former post, 'Why Cloudflare does not use containers in their Workers platform,' I discussed the V8 isolate architecture that enables them to achieve sub-millisecond serverless latency, supporting a significantly large number of tenants who could run their workloads independently without sharing memory and state at the edge.]]></description><link>https://shivangsnewsletter.com/p/how-unikraft-cloud-reduces-serverless</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/how-unikraft-cloud-reduces-serverless</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Sat, 09 Nov 2024 03:25:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OqgZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In my former post, '<a href="https://shivangsnewsletter.com/p/why-doesnt-cloudflare-use-containers">Why Cloudflare does not use containers in their Workers platform</a>,' I discussed the V8 isolate architecture that enables them to achieve sub-millisecond serverless latency, supporting a significantly large number of tenants who could run their workloads independently without sharing memory and state at the edge. If you haven't read it yet, it's a recommended read.</p><p>This is a subsequent post to that where I discuss how Unikraft Cloud, a serverless platform, achieves millisecond serverless cold starts and supports a relatively large number of workloads in a single server instance, leveraging unikernels and microVMs.</p><p>On my Cloudflare post, I got a few comments and messages about why Cloudflare designed isolates as opposed to leveraging unikernels. In the concluding part of this post, I'll discuss the differences between both approaches and why isolates are a better fit for Cloudflare in contrast to going forward with the unikernel approach.</p><p>With that being said, let's get on with it.</p><h2>What are unikernels?</h2><p>Unikernels are single-purpose OS images, with only the necessary OS features clubbed with the application code to form a minimal, lightweight, highly optimized runtime image.</p><p>Unikernels only contain the OS features and libraries that are required by the application code to run. They are compiled in the form of a single binary that can either run directly on the bare metal or a hypervisor without the need for an underlying general-purpose OS.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OqgZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OqgZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png 424w, https://substackcdn.com/image/fetch/$s_!OqgZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png 848w, https://substackcdn.com/image/fetch/$s_!OqgZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png 1272w, https://substackcdn.com/image/fetch/$s_!OqgZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OqgZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png" width="1456" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33076,&quot;alt&quot;:&quot;Unikernel&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Unikernel" title="Unikernel" srcset="https://substackcdn.com/image/fetch/$s_!OqgZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png 424w, https://substackcdn.com/image/fetch/$s_!OqgZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png 848w, https://substackcdn.com/image/fetch/$s_!OqgZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png 1272w, https://substackcdn.com/image/fetch/$s_!OqgZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F832bdd67-2af2-4d26-860c-1a627be084f3_1559x717.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In comparison to traditional VMs or containers that ideally run a general-purpose OS, unikernels are more lightweight, with a stripped-down version of the OS designed to run a specific application or a service. The approach of getting rid of all the unnecessary features cuts down on memory usage, boot time and CPU cycles, thus improving performance starkly.</p><h2>How are the unnecessary OS components stripped out?</h2><p>This is achieved by an architectural approach called the Library operation system that allows OS functionalities like networking, file I/O, memory management, etc., to be packaged as modular libraries.</p><p>With this, we can selectively pick OS features that our application code requires and package them together to create single binaries, cutting overhead and allowing us to build minimal, application-specific runtimes without the bloat.</p><h2>Unikernels are single-process systems</h2><p>Unikernels are typically single-process systems. This means they are designed to run a single application or a service as a single process with a minimal OS layer. This obviates the need for traditional multi-process environments that enable multiple applications or services to run concurrently. So, no context switching between processes.</p><p>Furthermore, there are no distinct user-space and kernel-space separations in the OS. This simplifies memory management and improves performance by getting rid of the isolation and the switches between the two spaces as well.</p><h2>Running unikernels on microVMs</h2><p>Once the unikernel image is ready, it could run directly on bare metal or a VM. However, it is more commonly run on a microVM leveraging a lightweight virtualization technology like Firecracker.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sh49!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sh49!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png 424w, https://substackcdn.com/image/fetch/$s_!Sh49!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png 848w, https://substackcdn.com/image/fetch/$s_!Sh49!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png 1272w, https://substackcdn.com/image/fetch/$s_!Sh49!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sh49!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png" width="1456" height="751" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:751,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36071,&quot;alt&quot;:&quot;Unikernel deployment&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Unikernel deployment" title="Unikernel deployment" srcset="https://substackcdn.com/image/fetch/$s_!Sh49!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png 424w, https://substackcdn.com/image/fetch/$s_!Sh49!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png 848w, https://substackcdn.com/image/fetch/$s_!Sh49!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png 1272w, https://substackcdn.com/image/fetch/$s_!Sh49!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcce60db4-44fa-4d4b-99d6-891f887b2dd1_1799x928.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>A microVM is a lightweight, stripped-down virtual machine designed to contain only the essential elements of virtualization cutting down the overhead that comes along with the traditional VMs. </p><p>These are optimized for environments where minimal startup time and reduced resource usage are required, like in serverless computing. </p><p><a href="https://firecracker-microvm.github.io/">Firecracker</a>, developed by AWS, is a minimalist VM monitor that enables us to deploy workloads in microVMs. It's used by AWS Lambda and Fargate to run isolated workloads with minimal resource overhead.</p></blockquote><p>Running unikernals on microVMs combines the benefits of unikernels and microVMs, making it ideal for running serverless and at the edge use cases, including running a large number of multi-tenant workloads on the same physical server, having strong isolation between unikernel instances per microVM.</p><p>Since unikernels and microVMs are lightweight, modern physical servers can run thousands of microVMs in parallel, managed by a VM monitor like Firecracker with minimal overhead.</p><h2>Handling a request</h2><p>When a request arrives, the VM monitor launches a new microVM, with the unikernel already loaded, within milliseconds. There is negligible serverless cold boot-up time.</p><p>Furthermore, if some unikernels have a lengthy initialization time based on the application complexity, a snapshot of their 'ready-to-serve' state is taken in memory. For subsequent requests, the unikernel is loaded from this snapshot directly into memory to cut down on the loading time.</p><blockquote><p>The ready-to-serve state snapshot includes the entire memory state of the application at the point it&#8217;s fully prepared to start handling requests.</p></blockquote><h2>Comparing Unikernels with Cloudflare isolates</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kGag!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kGag!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png 424w, https://substackcdn.com/image/fetch/$s_!kGag!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png 848w, https://substackcdn.com/image/fetch/$s_!kGag!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!kGag!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kGag!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png" width="1437" height="1018" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1018,&quot;width&quot;:1437,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70412,&quot;alt&quot;:&quot;V8 isolate architecture Cloudflare&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="V8 isolate architecture Cloudflare" title="V8 isolate architecture Cloudflare" srcset="https://substackcdn.com/image/fetch/$s_!kGag!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png 424w, https://substackcdn.com/image/fetch/$s_!kGag!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png 848w, https://substackcdn.com/image/fetch/$s_!kGag!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!kGag!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd493a040-04f4-43fd-99bb-2c9b2319105b_1437x1018.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The unikernels and <a href="https://shivangsnewsletter.com/p/why-doesnt-cloudflare-use-containers">isolates</a> differ in architecture and use cases. Unikernels achieve isolation by running as separate VMs or on bare metal, relying on hardware-level isolation. In contrast, Cloudflare's isolates provide isolation at the language runtime level within a shared process, offering more lightweight and fast context switching.</p><p>Unikernels, while efficient, are more resource-intensive compared to isolates.</p><p>Furthermore, Cloudflare's isolates primarily support JavaScript and languages that compile to WebAssembly. Unikernels can be built using various programming languages, depending on the unikernel framework used.</p><p>Unikernels are focused on creating single binaries that can be run directly on hardware or microVMs, providing better isolation.</p><p>The choice between them depends on the application requirements.</p><blockquote><p>If you wish to delve deeper into how Firecracker technology works, do go through this <a href="https://www.amazon.science/publications/firecracker-lightweight-virtualization-for-serverless-applications">research paper</a>. </p><p>Check out <a href="https://unikraft.cloud/">Unicraft Cloud</a> as well for more info.</p></blockquote><h2>Resources to upskill on cloud computing and systems programming</h2><p>To learn the fundamentals of cloud computing, including concepts like FaaS, containers, VMs, deployment infrastructure and more, do check out my <a href="https://learnsoftwarearchitecture.com/cloud-computing-101-master-the-fundamentals">cloud course</a>. It's a part of the <a href="https://learnsoftwarearchitecture.com/">zero to system architecture bundle</a>, which takes you from zero to mastering the fundamentals of system architecture.</p><p>Furthermore, to learn to code distributed systems like Git, Redis, Kafka, and more from the bare bones in the programming language of your choice, check out <a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>.</p><p>With their interactive, hands-on exercises, you'll develop a good concept on how these systems work, augmenting your domain knowledge and helping you become a better engineer. If you decide to make a purchase, you can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a> (affiliate).</p><h2>Recommended reads on systems programming</h2><p>I've implemented a&nbsp;<a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">single-threaded</a>&nbsp;and&nbsp;<a href="https://shivangsnewsletter.com/p/distributed-programming-part-3">multithreaded</a>&nbsp;TCP/IP server in Java. You can go through the linked posts to understand how servers function.</p><p>I am writing a crash course on building a distributed message broker like Kafka from the bare bones. You can read the <a href="https://shivangsnewsletter.com/p/systems-programming-coding-message-broker">first post here</a>. The following post on this is lined up and dropping pretty soon.</p><p>If you haven't subscribed yet, do subscribe to have my posts slide into your inbox as soon as they are published.</p><p>If you found this post insightful, consider sharing it with your friends for more reach. You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a>&nbsp;as well. I'll see you in the next post.</p><p>Until then, Bye!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale (Systems Architecture &amp; Systems Programming)! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Why doesn't Cloudflare use containers in their Workers platform infrastructure?]]></title><description><![CDATA[Cloudflare enables its customers to run serverless code at the edge globally at blazing speeds with almost zero cold startup time.]]></description><link>https://shivangsnewsletter.com/p/why-doesnt-cloudflare-use-containers</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/why-doesnt-cloudflare-use-containers</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Sat, 02 Nov 2024 12:16:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WXnU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Cloudflare enables its customers to run serverless code at the edge globally at blazing speeds with almost zero cold startup time.</p><p>They achieve this with their V8-powered deployment architecture in contrast to going forward with the traditional containers and Kubernetes approach.</p><p>The prime reason behind not using containers or VMs is achieving sub-millisecond serverless latency and supporting a significantly large number of tenants who could run their workloads independently without sharing memory and state at the edge.</p><p>We are aware that serverless instances spin up to handle requests and spin down when idle to save costs. This is a trade-off between latency and running costs. It takes from 500 ms to 10 seconds to spin up a container or a VM to process a request, resulting in an unpredictable code execution time.</p><p>Cloudflare's V8 isolate architecture, in contrast, warms up a function in under 5 milliseconds or less. However, this approach has trade-offs like any other design decision, which I'll discuss in the later part of this post.&nbsp;&nbsp;</p><p>Let's first delve into the V8 isolate design.</p><h2>Cloudflare V8 isolate architecture</h2><p>The V8 isolate architecture leverages the V8 engine (a high-performance JavaScript and Web Assembly engine originally developed by Google for Chrome) to run isolates, which are a lightweight sandboxed environment running individual workloads.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WXnU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WXnU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png 424w, https://substackcdn.com/image/fetch/$s_!WXnU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png 848w, https://substackcdn.com/image/fetch/$s_!WXnU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!WXnU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WXnU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png" width="1437" height="1018" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1018,&quot;width&quot;:1437,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70412,&quot;alt&quot;:&quot;Cloudflare V8 isolate architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Cloudflare V8 isolate architecture" title="Cloudflare V8 isolate architecture" srcset="https://substackcdn.com/image/fetch/$s_!WXnU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png 424w, https://substackcdn.com/image/fetch/$s_!WXnU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png 848w, https://substackcdn.com/image/fetch/$s_!WXnU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!WXnU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eacbf5d-b0d1-44ec-a0cc-5bc0a1547f9f_1437x1018.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A single V8 engine instance can run multiple isolates running independent workloads with strict code isolation. Code from one isolate cannot access the memory or data of another.</p><p>However, all these isolates share the same OS process, which makes them extremely lightweight and fast.</p><p>An isolate has its own mechanism to ensure safe memory access, which makes it possible to run untrusted code from many different customers/tenants within a single V8 engine instance run by an operating system process.</p><p>Every isolate executes code in an event-driven model suiting IO-intensive tasks.&nbsp;&nbsp;</p><h2>Cold start latency with traditional containerized serverless deployments</h2><p>In a traditional serverless architecture, every request to a function instance triggers a container process, which includes an OS layer, application dependencies and the function code.</p><p>Initializing a container has a time delay, aka the cold startup latency, which could range from hundreds of milliseconds to multiple seconds.</p><p>Also, since a function instance ideally runs in its own container process with process-level isolation, running a large number of tenant workloads serving millions of concurrent requests becomes starkly resource-intensive.</p><blockquote><p>In cloud environments, isolation is ideally achieved at the VM or the container-level process. Each workload runs as a separate container or a VM process with its own dedicated resources, reducing the risk of data leaks and unauthorized access. </p><p>Furthermore, if a certain process crashes, it doesn't affect the other workloads running on the same machine.</p></blockquote><h2>V8 isolates</h2><p>In contrast to containerized deployments, with isolates, running in V8 engine runtime, the startup time of an isolate on average is sub-milliseconds, since this does not involve boot up of any containers and OS processes.</p><p>The V8 runtime is already running and it spawns an isolate in sub-milliseconds when required to process a request.</p><p>In this architecture, multiple isolates can run within the same V8 engine, requiring minimal resources, enabling the platform to host and run code for a relatively large number of tenants at the edge, allowing instantaneous scaling.</p><p>Also, since a single OS process can run hundreds or thousands of isolates, this averts the inter-process context switching costs associated with container-based deployments.</p><p>This is crucial for Cloudflare as it runs thousands of tenant workloads at the edge on every machine and needs to rapidly switch between them thousands of times per second with minimal overhead.</p><p>If these workloads were running as separate containerized processes, the ability of the infrastructure to support a considerable number of customers would be greatly reduced. This is the key requirement behind coming up with a resource-optimized and scalable deployment approach like V8 isolates.</p><h2>Understanding the user space and kernel space</h2><p>Operating systems are split into user space and kernel space to separate application tasks from the core system tasks. Keeping control over how applications interact with the underlying OS and hardware resources is essential for security and performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i3kE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i3kE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png 424w, https://substackcdn.com/image/fetch/$s_!i3kE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png 848w, https://substackcdn.com/image/fetch/$s_!i3kE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!i3kE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i3kE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png" width="1437" height="1018" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1018,&quot;width&quot;:1437,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70412,&quot;alt&quot;:&quot;Cloudflare V8 isolate architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Cloudflare V8 isolate architecture" title="Cloudflare V8 isolate architecture" srcset="https://substackcdn.com/image/fetch/$s_!i3kE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png 424w, https://substackcdn.com/image/fetch/$s_!i3kE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png 848w, https://substackcdn.com/image/fetch/$s_!i3kE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!i3kE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92ae55e4-5399-42b1-aa79-27f354873631_1437x1018.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The kernel space runs the core OS operations with direct and unrestricted access to the underlying hardware resources. These operations involve handling IO, memory, process and thread management, etc.</p><p>All the user applications, like browsers, code editors, etc., run in the user space and interact with the underlying hardware through the kernel space. The application code that runs in the user space is sandboxed and has to request access to the system resources and other kernel services through system calls hitting the kernel space.</p><p>The separation between the user space and the kernel space is a fundamental design principle in operating systems, facilitating secure, fault-tolerant, and efficient execution of operations.</p><h2>V8 engine, isolates, and the underlying OS</h2><p>The V8 engine runs in the user space and interacts with the OS through the kernel space via system calls. It provides a sandboxed environment to run JavaScript and WebAssembly code.</p><p>When isolates process the requests in an event loop asynchronous IO fashion, the business logic runs in the user space, while the IO tasks rely on the OS kernel to handle the underlying network and other system interactions.</p><p>Each V8 engine instance runs in a user-space process with a process ID and with memory boundaries enforced by the OS. This provides a secure sandboxed V8 execution environment.</p><p>The memory allocated to a V8 instance is distributed across the isolates by the V8 engine. As the number of isolates increases, a V8 engine instance may request more memory from the OS.</p><p>V8 isolates provide strict tenant-level isolation. Though they technically reside in the same OS-allocated memory space inside a V8 engine instance, the V8 architecture ensures no data is shared between these isolates by allocating independent logical memory for each isolate, enforcing strict isolate-level sandboxing.</p><p>Each isolate&#8217;s data (objects, functions, closures, etc.) is stored in a distinct memory region managed by the V8 engine. Furthermore, each isolate has its separate garbage collection process as well, upholding strict isolation.</p><h2>Limitations of the V8 isolate architecture</h2><p>The isolate architecture is a well-engineered solution for deploying latency-sensitive multi-tenant workloads. However, it only supports JavaScript and languages like Go or Rust, whose code can be compiled to WebAssembly bytecode to run on a platform supporting WebAssembly.</p><p>Other cloud serverless offerings leveraging containerized and VM-based deployments support a range of programming languages based on workload requirements.</p><p>Furthermore, containers and VMs provide process-level isolation often preferred by enterprise workloads with strict security and compliance standards, primarily in the finance, government and healthcare space, which is not possible with isolates.</p><p>Also, isolates aren't designed for compute-intensive and long-running tasks. They have strict memory and compute limits set by the platform, suiting short IO-intensive lightweight operations running at the edge.</p><blockquote><p>A discussion post on IO-bound, CPU-bound, and memory-bound operations is in the pipeline, including the subsequent posts in the '<a href="https://shivangsnewsletter.com/p/systems-programming-coding-message-broker">Coding a message broker from the bare bones</a>' series. </p><p>I'll also discuss the Firecracker MicroVMs by AWS, which are lightweight virtual machines providing startup times close to isolates while retaining the flexibility of containers. Stay tuned.</p></blockquote><p>If you haven't subscribed yet, do subscribe to my newsletter to have my posts slide into your inbox as soon as they are published.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p><h2>Resources to upskill on cloud computing and implementing distributed systems</h2><p>To learn the fundamentals of cloud computing, including concepts like FaaS, containers, VMs, deployment infrastructure and more, do check out my <a href="https://learnsoftwarearchitecture.com/cloud-computing-101-master-the-fundamentals">cloud course</a>. It's a part of the <a href="https://learnsoftwarearchitecture.com/">zero to system architecture bundle</a>, which takes you from zero to mastering the fundamentals of system architecture.</p><p>Furthermore, to code distributed systems like Git, Redis, Kafka, and more from the bare bones in the programming language of your choice, check out <a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>.</p><p>With their interactive, hands-on exercises, you'll get a good understanding of how these systems work. If you decide to make a purchase, you can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a> (affiliate).</p><blockquote><p>If you wish to delve further into V8 isolate architecture, do go through <a href="https://developers.cloudflare.com/workers/reference/security-model/">this resource</a>.</p></blockquote><h2>Recommended reads on systems programming</h2><p>I've implemented a&nbsp;<a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">single-threaded</a>&nbsp;and&nbsp;<a href="https://shivangsnewsletter.com/p/distributed-programming-part-3">multithreaded</a>&nbsp;TCP/IP server in Java. You can go through the linked posts to understand how servers function.</p><p>I am writing a crash course on building a distributed message broker like Kafka from the bare bones. You can read the <a href="https://shivangsnewsletter.com/p/systems-programming-coding-message-broker">first post here</a>.<br><br>Also, check out my post on <a href="https://shivangsnewsletter.com/p/serverless">Serverless Compute &amp; Storage At the Edge With Stateless &amp; Stateful Functions</a>. It&#8217;s an insightful read.</p><p>If you found this post insightful, consider sharing it with your friends for more reach. You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a>&nbsp;as well. I'll see you in the next post.</p><p>Until then, Bye!</p>]]></content:encoded></item><item><title><![CDATA[A crash course on building a distributed message broker like Kafka from scratch - Part 1]]></title><description><![CDATA[Code a distributed message broker from the bare bones]]></description><link>https://shivangsnewsletter.com/p/systems-programming-coding-message-broker</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/systems-programming-coding-message-broker</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Thu, 10 Oct 2024 03:56:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello, hope you are doing great!</p><p>Thank you for subscribing to this newsletter.&nbsp;&nbsp;</p><p>This is the first in a series of posts I'll be writing to build a distributed message broker like Kafka from the bare bones in Go. In the posts, along with the code, you'll find discussions on the applicable system architecture concepts to help you get a comprehensive understanding of how a distributed system like Kafka and other log-based systems function and scale.</p><p>The posts collectively will form a crash course on coding distributed systems from scratch.</p><p>With this being said, let's jump right in.</p><h2>Kafka is a distributed commit log</h2><p>What does this mean?</p><p>A log-based structure is central to distributed messaging and event streaming systems like Kafka, AWS Kinesis, Apache Flink, RedPanda, etc.</p><p>A commit log, also often called an append-only log, maintains a sequence of records of events happening in the system or the changes in the system's state in an immutable append-only fashion. There is no modification or deletion of any event or data in the log sequence.</p><p>This structure ensures durability and consistency in a distributed system and has several use cases.</p><p>Kafka appends the messages sent to the topics in a commit log, which is an immutable sequence of events consumed by the subscribers.</p><h2>Commit/append-only log data structure use cases</h2><p>Imagine a stock trading service that pushes the stock events to a message broker to be consumed by the downstream services. Maintaining the order of transactions (events) is crucial for the business to avoid any financial inconsistencies, for smooth transaction reversals, to maintain a record of state changes (aka event sourcing) for auditing and dispute management, and such.</p><p>A commit log fits best in this use case in contrast to any other data structure, for instance, a queue, which message brokers typically leverage. (More in-depth discussion on the architectures of queue-based and log-based systems coming up in the forthcoming posts).</p><p>A distributed commit log maintains the durability of the transactions as they can be persisted to the disk based on the time duration or the size of the log data. This log data based on topics and partitions can be spread across a cluster of nodes increasing the throughput, in addition to making the service scalable and available.</p><p>In case a node fails, the data is not lost as it is replicated to other nodes with the help of another use case of log, which is replicated log. The replicated log ensures all the nodes in the cluster have identical copies of data, keeping a consistent cluster state.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aDLW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aDLW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png 424w, https://substackcdn.com/image/fetch/$s_!aDLW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png 848w, https://substackcdn.com/image/fetch/$s_!aDLW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png 1272w, https://substackcdn.com/image/fetch/$s_!aDLW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aDLW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png" width="1456" height="644" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:644,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42291,&quot;alt&quot;:&quot;distributed commit log append-only log&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="distributed commit log append-only log" title="distributed commit log append-only log" srcset="https://substackcdn.com/image/fetch/$s_!aDLW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png 424w, https://substackcdn.com/image/fetch/$s_!aDLW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png 848w, https://substackcdn.com/image/fetch/$s_!aDLW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png 1272w, https://substackcdn.com/image/fetch/$s_!aDLW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff19fb2b0-7d78-4166-8e9a-d364aadf954c_1883x833.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Furthermore, since the data structure is append-only, the transactions or events can be replayed to understand a certain flow of system events without any fuss.</p><h2>Other uses of append-only commit logs</h2><p>Kafka isn't the first implementation of the append-only log. Databases use commit logs called write-ahead logs to store a log of data changes before they are applied to the actual data.</p><p>In the event of a failure, DBs rely on these logs to recover and maintain data consistency. These logs are a key enabler in achieving ACID in databases.</p><p>Consensus protocols like Raft maintain a replicated log of changes to maintain cluster consistency. The leader appends the events to the log and then the log is replicated to all the follower nodes in the cluster.</p><p>Now that we have an understanding of the central data structure of our message broker let's create a cluster node to store a log.</p><pre><code>type Node struct {
&nbsp; ID &nbsp; &nbsp; string
&nbsp; Status string
&nbsp; CommitLog []*storage.Log
}</code></pre><p>Before I delve into the code explanation, let's understand what a node is. What does it mean, really?</p><h2>Cluster node</h2><p>A node represents a single instance in a cluster or a distributed system. We can see it as a single unit of compute or storage and it can be implemented/deployed in different forms, depending on the context.</p><p>When running a message broker cluster comprised of multiple nodes in our local machine, where each node is responsible for storing logs and interacting with the producer and the consumer, a single node means a single isolated process running in the machine.</p><p>Every node in the cluster runs as an isolated process on separate ports on our local machines. All these cluster nodes have a process-level isolation simulating a distributed cluster.</p><blockquote><p>In case you wish to understand ports, I've discussed ports, sockets and IP addresses in another post where I've implemented a <a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">single-threaded TCP/IP server</a> in Java. Do check it out.</p></blockquote><p>If we run all these nodes as separate containers in Docker, they will now have container-level isolation on our local machine. Every container will have its own broker code, dependencies and environment.</p><p>This is more like a production-like environment where all the nodes are isolated and running independently.</p><p>Now, if we deploy these container instances of nodes on different virtual machines on AWS, Azure or GCP, they will have VM-level isolation. Or if we deploy these nodes on bare-metal servers, they will have physical server isolation.</p><p>In this scenario, we can say one bare metal server is one node of our distributed message broker. So, what a node is in a distributed system totally depends on the context. At the same time, we can see it as a single unit of compute or storage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!prfi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!prfi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png 424w, https://substackcdn.com/image/fetch/$s_!prfi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png 848w, https://substackcdn.com/image/fetch/$s_!prfi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!prfi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!prfi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png" width="1456" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180363,&quot;alt&quot;:&quot;node in a cluster&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="node in a cluster" title="node in a cluster" srcset="https://substackcdn.com/image/fetch/$s_!prfi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png 424w, https://substackcdn.com/image/fetch/$s_!prfi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png 848w, https://substackcdn.com/image/fetch/$s_!prfi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!prfi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6747758a-43d0-4128-a6b5-e0e41a8abef3_2546x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let's get back to our node code.</p><h2>Creating a node for our message broker</h2><pre><code>type Node struct {
&nbsp; ID &nbsp; &nbsp; string &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; Status string &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; CommitLog []*storage.Log &nbsp; &nbsp;
}</code></pre><p>The above Go struct represents our message broker node. It contains three fields: ID, Status and a CommitLog, which is an append-only list of Logs.</p><p>ID is a unique identifier for the node. String type for ID allows for a flexible ID that may include a UUID or any other custom identifier. Status indicates the current status of the node (e.g., idle, active, down, terminated, etc.) and CommitLog is an append-only list of events or messages (contained within a Log struct) that the node receives from the event producer.</p><p><code>Log</code> in <code>CommitLog []*storage.Log</code> is a struct that represents an entry in a commit log and storage is the package name where log.go the file containing the Log struct code is housed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oy-a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oy-a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png 424w, https://substackcdn.com/image/fetch/$s_!oy-a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png 848w, https://substackcdn.com/image/fetch/$s_!oy-a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png 1272w, https://substackcdn.com/image/fetch/$s_!oy-a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oy-a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png" width="437" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:437,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27259,&quot;alt&quot;:&quot;Go project structure&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Go project structure" title="Go project structure" srcset="https://substackcdn.com/image/fetch/$s_!oy-a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png 424w, https://substackcdn.com/image/fetch/$s_!oy-a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png 848w, https://substackcdn.com/image/fetch/$s_!oy-a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png 1272w, https://substackcdn.com/image/fetch/$s_!oy-a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb14e87c-2e16-4c18-a1e3-2e35507ea5da_437x482.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here is the complete Node.go code:</p><pre><code>type Node struct {
&nbsp; ID &nbsp; &nbsp; string
&nbsp; Status string
&nbsp; CommitLog []*storage.Log
}

func NewNode(id string) *Node {
  return &amp;Node{
     ID:     id,
     Status: "active",
     CommitLog:   []*storage.Log{},
  }
}

func (n *Node) AddLogEntry(log *storage.Log) {
   if log == nil {
      logrus.Warn("Attempted to add a nil log entry")
      return
    }

    n.CommitLog = append(n.CommitLog, log)
    logrus.Infof("Log added to Node ID %s: %s\n", n.ID, log.Message)
}

func (n *Node) PrintStatus() {
   logrus.WithFields(logrus.Fields{
      "NodeID": n.ID,
      "Status": n.Status,
   }).Info("Node status")
}

func (n *Node) PrintCommitLog() {
    if len(n.CommitLog) == 0 {
      logrus.Info("CommitLog is empty.")
      return
    }

    for index, logEntry := range n.CommitLog {
      logrus.WithFields(logrus.Fields{
&#9; "Index":     index + 1,
&#9; "Timestamp": logEntry.Timestamp.Format(time.RFC3339),
&#9; "Message":   logEntry.Message,
        }).Info("Commit log entry")
    }
}</code></pre><h2>Event log</h2><pre><code>type Log struct {
&nbsp; Timestamp time.Time
&nbsp; Message string
}</code></pre><p>The Log struct, which represents an individual log entry in the commit log, has two fields: Timestamp, which represents the time when the event was logged, and Message, which is the event payload; in other words, the actual data being logged.</p><p>For logging the time, Go's time package is leveraged to provide precise date and time information, and for simplicity, the payload is accepted as a string data type. In the forthcoming posts, as we add more features to our message broker, we can create a complex data type to store the event payload as opposed to a string.</p><p>Here is the complete Log.go code</p><pre><code>type Log struct {
&nbsp; Timestamp time.Time
&nbsp; Message &nbsp; string
}

func NewLog(message string) *Log {
  if strings.TrimSpace(message) == "" {
    logrus.Warn("Creating a log with an empty or blank message")
  }

  return &amp;Log{
    Timestamp: time.Now(),
    Message:   message,
  }
}

func (l *Log) PrintLog() {
  logrus.WithFields(logrus.Fields{
    "Timestamp": l.Timestamp.Format(time.RFC3339),
    "Message":   l.Message,
  }).Info("Log entry")
}</code></pre><p>The Log.go file has a constructor function that takes the event payload as a string and returns a new Log instance.</p><pre><code>func NewLog(message string) *Log {
  if strings.TrimSpace(message) == "" {
     logrus.Warn("Creating a log with an empty or blank message")
  }

  return &amp;Log{
    Timestamp: time.Now(),
    Message: message,
   }
}</code></pre><p>We have another method PrintLog(), defined on the Log struct in the same file.</p><pre><code>func (l *Log) PrintLog() {
   logrus.WithFields(logrus.Fields{
     "Timestamp": l.Timestamp.Format(time.RFC3339),
     "Message":   l.Message,
   }).Info("Log entry")
}</code></pre><p>This method logs the details of the Log using the Logrus logging library.</p><p>Logging helps us keep track of events that happen in a distributed system comprising multiple nodes. It helps with debugging and understanding the flow of events.</p><p>We have similar constructor functions and methods in the Node.go file that create a new node, taking the node ID as an argument and log the Node status.</p><pre><code>func NewNode(id string) *Node {
&nbsp; return &amp;Node{
&nbsp; &nbsp; ID: &nbsp; &nbsp; id,
&nbsp; &nbsp; Status: "active",
&nbsp; &nbsp; CommitLog: &nbsp; []*storage.Log{},
 }
}

func (n *Node) PrintStatus() {
   logrus.WithFields(logrus.Fields{
      "NodeID": n.ID,
      "Status": n.Status,
   }).Info("Node status")
}</code></pre><p>We have another method, AddLogEntry, which adds a log entry to the Node.</p><pre><code>func (n *Node) AddLogEntry(log *storage.Log) {
   if log == nil {
      logrus.Warn("Attempted to add a nil log entry")
      return
    }

    n.CommitLog = append(n.CommitLog, log)
    logrus.Infof("Log added to Node ID %s: %s\n", n.ID, log.Message)
}</code></pre><h2>Running the code</h2><p>In main.go, in the main function, we create a new message broker node, print its status, then create a new log entry manually and append it to the CommitLog.</p><pre><code>func main() {
&nbsp; logrus.SetFormatter(&amp;logrus.TextFormatter{FullTimestamp: true})
&nbsp; logrus.Info("Starting the message broker...")

&nbsp; nodeID := "node-1"
&nbsp; brokerNode := broker.NewNode(nodeID)
&nbsp; brokerNode.PrintStatus()
&nbsp; logEntry := storage.NewLog("Hello, world! This is the first log message.")
  brokerNode.AddLogEntry(logEntry)
  brokerNode.PrintCommitLog();
}</code></pre><p>broker and storage are the package names where the files Node.go and Log.go containing the NewNode constructor function and NewLog method are located, in case you are wondering where those come from.</p><p>Here are the results in the terminal:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!35HM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!35HM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png 424w, https://substackcdn.com/image/fetch/$s_!35HM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png 848w, https://substackcdn.com/image/fetch/$s_!35HM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png 1272w, https://substackcdn.com/image/fetch/$s_!35HM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!35HM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png" width="1418" height="130" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:130,&quot;width&quot;:1418,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18551,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!35HM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png 424w, https://substackcdn.com/image/fetch/$s_!35HM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png 848w, https://substackcdn.com/image/fetch/$s_!35HM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png 1272w, https://substackcdn.com/image/fetch/$s_!35HM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F572cd261-c007-4495-ad12-e6e7d3e015b7_1418x130.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>Summary</h2><p>Let's summarize the whole thing: we created a Node struct that represents a message broker node. It has NewNode constructor that returns a new node instance and then it has AddLog, PrintStatus and PrintCommitLog methods.</p><p>We then created a Log struct, which represents an individual entry in the commit log. This has NewLog and PrintLog methods, creating a new Log instance and printing the Log.</p><p>In the main function, I created a new node via the NewNode constructor function with "node-1" as the node ID and then printed its status.</p><pre><code>time="2025-01-21T18:51:43+05:30" level=info msg="Node status" NodeID=node-1 Status=active</code></pre><p>I then created a new Log instance with the help of the NewLog constructor function and added this Log instance to the CommitLog via the AddLog method and then printed the CommitLog.</p><pre><code>time="2025-01-21T18:51:43+05:30" level=info msg="Log added to Node ID node-1: Hello, world! This is the first log message.\n" 

time="2025-01-21T18:51:43+05:30" level=info msg="Commit log entry" Index=1 Message="Hello, world! This is the first log message." Timestamp="2025-01-21T18:51:43+05:30"</code></pre><p>In the forthcoming posts, we will extend our code by creating a producer API for our message broker to ingest logs from so that we don't manually have to create a log entry and append it to the commit log.</p><p>Also, we will create a consumer API to enable the consumers to consume logs. Subsequently, we will implement concepts such as message broker topics, partitions, and such, in addition to discussing the intricacies involved.</p><p>Now that I've coded this in Go does not mean you have to implement this in the same language as well. You can implement the message broker in the programming language of your choice, given I've explained the high-level flow of the system.</p><p>Furthermore, you can:</p><h2>Practice coding distributed systems in the programming language of your choice</h2><p><a href="https://bit.ly/3swSHHl">CodeCrafters</a> is a platform that is designed to help developers learn by building distributed systems like Redis, Docker, Git, Kafka, etc., from the bare bones.</p><p>With their interactive, hands-on exercises, you'll not only deeply understand how these complex systems work but implement them step by step in the programming language of your choice and grow your engineering skills, becoming a deft software engineer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PiaP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PiaP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png 424w, https://substackcdn.com/image/fetch/$s_!PiaP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png 848w, https://substackcdn.com/image/fetch/$s_!PiaP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png 1272w, https://substackcdn.com/image/fetch/$s_!PiaP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PiaP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png" width="1327" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1327,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114542,&quot;alt&quot;:&quot;codecrafters&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="codecrafters" title="codecrafters" srcset="https://substackcdn.com/image/fetch/$s_!PiaP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png 424w, https://substackcdn.com/image/fetch/$s_!PiaP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png 848w, https://substackcdn.com/image/fetch/$s_!PiaP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png 1272w, https://substackcdn.com/image/fetch/$s_!PiaP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42beb2ef-01cb-417c-adbf-7a8cb21c24e3_1327x870.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Do check out their platform, if you enjoy the experience and decide to make a purchase, you can use&nbsp;<a href="https://bit.ly/3QL4TN0">my unique link to get 40% off</a>.</p><p>The above link is an affiliate link. When you make a purchase through it, I get a small cut without you paying anything extra. Cheers!</p><h2>Further reads on systems programming</h2><p>I've implemented a <a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">single-threaded</a> and <a href="https://shivangsnewsletter.com/p/distributed-programming-part-3">multithreaded</a> TCP/IP server in Java. You can go through the linked posts to understand how servers function.</p><p>You'll find more systems programming and systems architecture posts in this newsletter, published from time to time. If you haven't subscribed yet, do subscribe to have these posts slide into your inbox as soon as they are published.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p><p>If you found the post helpful, consider sharing it with your friends for more reach. You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a> &amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a> and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a> as well. I'll see you in the next post. Until then, Cheers!</p>]]></content:encoded></item><item><title><![CDATA[How Monzo and DoorDash integrate external libraries with their code and how do web-scale companies manage a large number of microservices]]></title><description><![CDATA[Monzo, a UK-based online bank, recently shared insights into how they run migrations across 2800 microservices.]]></description><link>https://shivangsnewsletter.com/p/managing-microservices</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/managing-microservices</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Tue, 03 Sep 2024 12:23:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!GjfT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Monzo, a UK-based online bank, recently shared insights into how they run migrations across 2800 microservices. In this post, I'll quickly bring up a key aspect they discussed in the article: how they integrate their application code with external libraries. The approach helps them switch to new libraries without significant code refactoring, subsequently resulting in efficient migrations.</p><p>As opposed to directly integrating third-party code with their code which results in tightly coupled code, they create an abstraction layer and the abstraction layer, in turn, communicates with the external code, acting as an adapter.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GjfT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GjfT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png 424w, https://substackcdn.com/image/fetch/$s_!GjfT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png 848w, https://substackcdn.com/image/fetch/$s_!GjfT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png 1272w, https://substackcdn.com/image/fetch/$s_!GjfT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GjfT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png" width="558" height="488" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b163db2-b60d-44f5-8efd-344c61984655_558x488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:488,&quot;width&quot;:558,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Alt text: diagram showing how a Monzo service uses the new Monzo tracing sdk, which in turn wraps the deprecated opentracing sdk.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Alt text: diagram showing how a Monzo service uses the new Monzo tracing sdk, which in turn wraps the deprecated opentracing sdk." title="Alt text: diagram showing how a Monzo service uses the new Monzo tracing sdk, which in turn wraps the deprecated opentracing sdk." srcset="https://substackcdn.com/image/fetch/$s_!GjfT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png 424w, https://substackcdn.com/image/fetch/$s_!GjfT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png 848w, https://substackcdn.com/image/fetch/$s_!GjfT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png 1272w, https://substackcdn.com/image/fetch/$s_!GjfT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b163db2-b60d-44f5-8efd-344c61984655_558x488.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Img src: <a href="https://monzo.com/blog/how-we-run-migrations-across-2800-microservices">Monzo</a></figcaption></figure></div><p>With the abstraction layer, based on dynamic configuration, they can switch between multiple libraries without the need for significant code refactoring. During a library change, the only layer that might need refactoring is the abstraction layer. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KxTf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KxTf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png 424w, https://substackcdn.com/image/fetch/$s_!KxTf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png 848w, https://substackcdn.com/image/fetch/$s_!KxTf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png 1272w, https://substackcdn.com/image/fetch/$s_!KxTf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KxTf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png" width="944" height="478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:478,&quot;width&quot;:944,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A diagram showing how an example Monzo service the new Monzo tracing sdk, and how this in turn wraps both the deprecated OpenTracing sdk and the new OpenTelemetry sdk&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A diagram showing how an example Monzo service the new Monzo tracing sdk, and how this in turn wraps both the deprecated OpenTracing sdk and the new OpenTelemetry sdk" title="A diagram showing how an example Monzo service the new Monzo tracing sdk, and how this in turn wraps both the deprecated OpenTracing sdk and the new OpenTelemetry sdk" srcset="https://substackcdn.com/image/fetch/$s_!KxTf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png 424w, https://substackcdn.com/image/fetch/$s_!KxTf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png 848w, https://substackcdn.com/image/fetch/$s_!KxTf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png 1272w, https://substackcdn.com/image/fetch/$s_!KxTf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1eb9922c-63f0-41dc-8d36-3f051be33309_944x478.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Img src: <a href="https://monzo.com/blog/how-we-run-migrations-across-2800-microservices">Monzo</a></figcaption></figure></div><p>This approach also enabled Monzo Engineering to instrument external libraries with their own telemetry libraries, in addition to rolling forward and back without the need to redeploy all services.</p><p>DoorDash follows a similar approach when working with cache libraries like Redis and Caffeine. I've discussed this in a <a href="https://shivangsnewsletter.com/p/caching-in-microservices-architecture">former newsletter post</a>. Do give it a read.</p><p>As a rule of thumb, we should avoid tight coupling external code with the application code and should always have an abstraction layer in between. This increases our code flexibility and reusability starkly.</p><p>Furthermore, how do web-scale companies manage such a significant number of microservices. I mean 2800 microservices clearly does not mean a dedicated team owning and managing every microservice. Uber has close to 4K microservices and Netflix over 1K.</p><h2>Managing a large number of microservices</h2><p>Automation and efficient microservices tooling are key drivers to managing and deploying such a large number of microservices at scale. For instance, Monzo has built mass deployment tooling that allows them to push out library changes across all services as an asynchronous batch job.</p><p>Microservices are tagged with criticality tiers and the least critical are deployed first, leveraging automated rollback checks using <a href="https://argoproj.github.io/rollouts/">Argo Rollouts</a>.</p><p>Small revertable changes are deployed first that are quicker to implement and validate with minimal blast radius if things go wrong.</p><p>Strict monitoring, with alerts configured in Grafana and Prometheus, and auditing checks are put in place. As an example, the deployment pipeline automatically blocks operations like deploying a version to production that has not been deployed to staging yet. Furthermore, a record of pipeline events is maintained to quickly figure out and rollback things if anything goes wrong.</p><p>A lot of config-controlled abstraction has been built into place, like I discussed before, to facilitate efficient interaction with the underlying technologies, which also enhances the development experience.</p><p>Emphasis is put on technology consistency like using similar technologies across microservices as much as possible to reduce friction when engineers move across services. This also enables them to reuse tooling written for a certain technology across microservices.</p><p>Similarly, every web-scale company leverages a set of tools to manage and scale their systems comprising a considerable number of microservices.</p><p>Service discovery and registry tools like Consul, Eureka, etc. enable microservices to discover each other dynamically and work in conjunction smoothly. Service mesh tools like Istio, Linkerd, etc, manage service-to-service communication, handling things like retries and circuit-breaking. Load balancers like Envoy, NGINX, etc.,<strong> </strong>distribute traffic across service instances.</p><p>API gateways like Zuul manage access to backend services, also managing cross-cutting concerns like authentication, rate limiting, and caching. Orchestration tools like Kubernetes and Docker manage the container-based deployment and scaling of microservices.</p><p>Distributed tracing tools like Zipkin, Jaeger, etc., facilitate request tracking across the distributed microservices architecture. Metric and logging tools like Prometheus, Grafana and ELK stack help monitor system errors, performance and health.</p><blockquote><p>I&#8217;ve discussed <a href="https://shivangsnewsletter.com/p/observability-in-distributed-systems">observability</a> in detail here. Do check it out.</p></blockquote><p>To manage configuration settings across thousands of microservice tools like HashiCorp Vault are leveraged. CI/CD tools like Jenkins, GitLab CI, etc., are leveraged to efficiently deploy these services.</p><p>Infrastructure as code tools like Terraform, Cloudformation, etc., facilitate programmatic infrastructure management. Netflix's <a href="https://netflix.github.io/chaosmonkey/">Chaos Monkey</a> randomly terminates instances in production to ensure the services are resilient to instance failures.</p><p>In summary, web-scale companies leverage a lot of intricate tooling and robust system architecture with automation and observability to manage and scale a significantly large number of microservices.</p><p>For further read do check out the <a href="https://monzo.com/blog/how-we-run-migrations-across-2800-microservices">Monzo microservices migration</a> post here.</p><blockquote><p>If you wish to go from zero to confidently contributing to system design discussions at your workplace, making informed decisions while having a firm grasp on the fundamentals, regardless of your current role and experience level, check out the&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Mastering System Architecture learning path</a> that educates you step by step on web architecture, cloud computing, the infrastructure supporting scalable web services and distributed system design, starting right from zero.</p></blockquote><blockquote><p><a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a> is a platform that helps you code distributed systems like Redis, Docker, Git, a DNS server, etc., step-by-step from the bare bones in the programming language of your choice. With their hands-on courses, you get an in-depth understanding of distributed systems and advanced system design concepts, which helps you become a more proficient engineer. </p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a> (affiliate) if you decide to make a purchase. Moreover, I've written a few posts on <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">coding distributed systems</a> from the bare bones on my newsletter. You can check them out <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">here</a>.</p></blockquote><p>If you found this post insightful,&nbsp;do share the&nbsp;<a href="https://shivangsnewsletter.com/p/managing-microservices">web link</a> with your network for more reach. You can connect with me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a> &amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;in addition to replying to this post.</p><p>I'll see you around in my next post. Until then, Cheers!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Kafka tiered-storage at Uber and using it as a system of record]]></title><description><![CDATA[Kafka is heavily used in Uber's tech stack, serving several critical use cases, including batch and real-time systems.]]></description><link>https://shivangsnewsletter.com/p/kafka-tiered-storage-at-uber</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/kafka-tiered-storage-at-uber</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Wed, 28 Aug 2024 05:28:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9c6_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kafka is heavily used in Uber's tech stack, serving several critical use cases, including batch and real-time systems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9c6_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9c6_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png 424w, https://substackcdn.com/image/fetch/$s_!9c6_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png 848w, https://substackcdn.com/image/fetch/$s_!9c6_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png 1272w, https://substackcdn.com/image/fetch/$s_!9c6_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9c6_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png" width="1456" height="743" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:743,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!9c6_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png 424w, https://substackcdn.com/image/fetch/$s_!9c6_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png 848w, https://substackcdn.com/image/fetch/$s_!9c6_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png 1272w, https://substackcdn.com/image/fetch/$s_!9c6_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91ebf6bf-d0f9-4e34-8c7d-f25213635ad1_1600x817.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Img src: <a href="https://www.uber.com/en-IN/blog/kafka-tiered-storage/">Uber</a></figcaption></figure></div><p>The total storage in Kafka clusters depends on factors like the number of topic partitions, throughput and retention configuration. To scale the cluster storage more nodes need to be added that come along with their additional CPU and memory, which is not ideally required.</p><p>In this scenario, the storage and compute are tightly coupled, which causes issues with scalability, flexibility and deployability. Furthermore, storing data in local cluster nodes is expensive and increases cluster complexity, in contrast to storing data in remote storage, such as a cloud object store.</p><p>To tackle this, <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage?uclick_id=baa1d748-f6ba-4329-ba82-f691f6a0b2c7">Uber Engineering proposed Kafka tiered storage</a>, where the Kafka broker would have two tiers of storage called the local and remote, with respective data retention policies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S-zH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S-zH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png 424w, https://substackcdn.com/image/fetch/$s_!S-zH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png 848w, https://substackcdn.com/image/fetch/$s_!S-zH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png 1272w, https://substackcdn.com/image/fetch/$s_!S-zH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S-zH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png" width="1456" height="386" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:386,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!S-zH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png 424w, https://substackcdn.com/image/fetch/$s_!S-zH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png 848w, https://substackcdn.com/image/fetch/$s_!S-zH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png 1272w, https://substackcdn.com/image/fetch/$s_!S-zH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f44189f-f5b3-4cc3-b790-b70c5b5e84c4_1600x424.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Img src: <a href="https://www.uber.com/en-IN/blog/kafka-tiered-storage/">Uber</a></figcaption></figure></div><p>The local storage is the local cluster storage, while the remote tier is the extended storage with a significantly longer retention period in contrast to the local tier storage.</p><p>Kafka natively supporting extended storage, decouples the storage and compute, decreases the load on the local storage and enables systems to store data for a significantly longer duration with less complexity and optimized costs.</p><p>Separating storage and compute is a common practice in modern web architecture. I'll be delving into this in my future posts. For a detailed read on Kafka's tiered-storage architecture at Uber, check out this <a href="https://www.uber.com/en-IN/blog/kafka-tiered-storage/">Uber engineering blog post</a>.</p><h2>With persistent storage natively available and compute and storage decoupled powered by the extended tiered storage architecture, can we use Kafka as a system of record or a database?</h2><p>Kafka is a distributed append-only immutable log. The immutable log is helpful in use cases where the history of events needs to be retained and replayed with high throughput.</p><p>A database has database-specific features that store the application state. It's evident that Kafka cannot fit in use cases where we require a dedicated database. However, if we are already using Kafka in our architecture and it serves our persistent requirements well, we may use it as a system of record and avoid plugging in a dedicated database for simplicity.</p><p>KOR Financial, a financial services startup, leveraged this approach using <a href="https://thenewstack.io/ditching-databases-for-apache-kafka-as-system-of-record/">Kafka&nbsp;as the system of record</a> instead of relying on relational databases to store data.</p><p>Their data streaming architecture powered by Kafka captures events in addition to the system state handling hundreds of petabytes of data cost-effectively. They leverage Confluent Cloud as an extended storage platform to store data for as long as they want.</p><h2>Using Kafka as a database</h2><p>Furthermore, Confluence has an excellent article on <a href="https://www.confluent.io/blog/is-kafka-a-database-with-ksqldb/">if Kafka can be used as a database</a> and it's a recommended read.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k_iS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k_iS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png 424w, https://substackcdn.com/image/fetch/$s_!k_iS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png 848w, https://substackcdn.com/image/fetch/$s_!k_iS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png 1272w, https://substackcdn.com/image/fetch/$s_!k_iS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k_iS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png" width="1456" height="509" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:509,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;api-manager&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="api-manager" title="api-manager" srcset="https://substackcdn.com/image/fetch/$s_!k_iS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png 424w, https://substackcdn.com/image/fetch/$s_!k_iS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png 848w, https://substackcdn.com/image/fetch/$s_!k_iS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png 1272w, https://substackcdn.com/image/fetch/$s_!k_iS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127c8707-c2de-4c23-87c4-2922016596d6_1999x699.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Img src: <a href="https://www.confluent.io/blog/is-kafka-a-database-with-ksqldb/">Confluent</a></figcaption></figure></div><p>They discuss a CRUD service that streams all the CRUD events into a Kafka producer that further updates the database and streams the event to a Kafka Customer topic.</p><p>During the update and delete operations, the database has a final state every time, unlike Kafka, where the update/delete events are appended in the log as new records in a sequence as opposed to a single source of data being updated or deleted with a final state like in a database.</p><p>The article further discusses the use of ksqlDB (built on top of Kafka) as an alternative to the database in the system architecture. <a href="https://www.confluent.io/blog/is-kafka-a-database-with-ksqldb/">Do give it a read</a>.</p><blockquote><p>If you wish to go from zero to confidently contributing to system design discussions at your workplace, making informed decisions while having a firm grasp on the fundamentals, regardless of your current role and experience level, check out the&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Mastering System Architecture learning path</a> that educates you step by step on web architecture, cloud computing, the infrastructure supporting scalable web services and distributed system design, starting right from zero.</p></blockquote><blockquote><p>To become the best in your game as a product developer, separating yourself from the chaff with an understanding of the nuances of the trade, <a href="https://learnsoftwarearchitecture.com/devroadmap">check out this resource</a>.</p></blockquote><p>This post touched upon how businesses are leveraging Kafka's data storage features and how those compare with a traditional database. If you found this post insightful,&nbsp;do share the&nbsp;<a href="https://shivangsnewsletter.com/p/kafka-tiered-storage-at-uber">web link</a>&nbsp;with your network for more reach. You can connect with me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a> &amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;in addition to replying to this post.</p><p>I'll see you around in my next post. Until then, Cheers!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[How Zerodha scaled from zero to 11 million users: Key takeaways]]></title><description><![CDATA[Zerodha (India's largest stockbroker) recently gave a talk on YouTube on how they scaled from zero to 11M users.]]></description><link>https://shivangsnewsletter.com/p/zerodha-system-scale</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/zerodha-system-scale</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Fri, 16 Aug 2024 06:37:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d35d7e-83e0-445f-99b4-7a2f9f9c37b5_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Zerodha (India's largest stockbroker) recently gave a <a href="https://www.youtube.com/watch?v=tkH-r4oJEQI&amp;t=2092s">talk on YouTube</a> on how they scaled from zero to 11M users.</p><p><strong>Here is the gist</strong>:</p><p>A decade back they started with automating financial processes that they ran on Excel sheets with Python scripts. With this, processes that required 2 work hours were smashed in 2 seconds.</p><p>They moved on to set up a self-hosted Postgres database that hosts tens of terabytes containing hundreds of billions of rows of financial records today.</p><p>Self-hosting a technology, tinkering with it, getting acquainted and eventually mastering it proved to be a crucial factor in enabling the company to grow, keeping the costs at a bare minimum, in addition to avoiding the licensing costs that would come along when using proprietary managed services.</p><p>They get approx. 2M+ concurrent users logged in as the markets open in the morning handled by Redis instances running for years requiring zero maintenance. The infrastructure is hosted on AWS instances without the use of any managed services.</p><p>Their original ticketing system was a Gmail inbox logged in simultaneously by a dozen support people. Later, when the number of clients became sizeable, they deployed a PHP-based support ticketing system, which still powers the support today used by a thousand employees handling millions of clients and the same number of tickets.</p><p>The support system infrastructure costs them peanuts (a few hundred dollars every year) in contrast to when leveraging a managed support SaaS, which would have cost them millions of dollars annually.</p><p>Self-hosting everything took time initially, but in the longer run helped them scale, keeping the costs at a bare minimum, in addition to avoiding vendor lock-in.</p><p>They originally built the backend services in Python around 2014 but stumbled across performance issues and bottlenecks, which led them to rewrite their services in Go. Their high-frequency concurrent trading service today runs on Go with a response baseline of less than 50ms.</p><p>For infrastructure orchestration, they leverage <a href="https://www.nomadproject.io/">Nomad</a>. With it, they achieved visibility into their infrastructure, enabling a new developer to make changes significantly faster by understanding a few lines of configuration code.</p><p>They built an in-house open-source email software called <a href="https://listmonk.app/">ListMonk</a>, which they leverage to send out 200 million emails every month with a cost of just a couple of hundred dollars in hosting.</p><p>Their tech team is comprised of 30 people with the core philosophy of staying close to the software via self-hosting, keeping things simple and keeping external dependencies to an absolute minimum, which helps them avoid layers of abstractions and opaqueness that comes with managed software. This philosophy has significantly contributed to their profit margins.</p><p>They believe having to invest significant energy managing the self-hosted software may have been an issue a decade before but is not now given how robust the open source software is today.</p><p>During COVID-19, the number of orders on their platform grew exponentially from 2 million to 12 million per day, and the same philosophy helped them scale with the same team size without changing the infrastructure and provisioning additional hardware.</p><blockquote><p>If you wish to go from zero to confidently contributing to system design discussions at your workplace, making informed decisions while having a firm grasp on the fundamentals, regardless of your current role and experience level, check out the&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Mastering System Architecture learning path</a>&nbsp;that educates you step by step on web architecture, cloud computing, the infrastructure supporting scalable web services and distributed system design, starting right from zero.</p></blockquote><p>If you found this post insightful,&nbsp;do share the <a href="https://shivangsnewsletter.com/p/zerodha-system-scale">web link</a> with your network for more reach. You can connect with me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a> in addition to replying to this post.</p><p>I'll see you around in my next post. Until then, Cheers!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[The thought process behind picking the right database for our service with a reference checklist]]></title><description><![CDATA[Picture the below data persistence requirements: The DB should be able to handle a minimum of 1.5 million RPS (Request Per Second) with 1 million writes per second and 500K reads per second. Eventual data consistency is fine. Low response latency (< 5 ms)]]></description><link>https://shivangsnewsletter.com/p/pick-the-right-database-for-your-service</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/pick-the-right-database-for-your-service</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Thu, 18 Jul 2024 10:31:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d35d7e-83e0-445f-99b4-7a2f9f9c37b5_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Picture the below data persistence requirements:</p><ul><li><p>The DB should be able to handle a minimum of 1.5 million RPS (Request Per Second) with 1 million writes per second and 500K reads per second.</p></li><li><p>Eventual data consistency is fine.&nbsp;</p></li><li><p>Low response latency (&lt; 5 ms)</p></li><li><p>Is highly available and scales on demand</p></li></ul><p>Given these requirements how would we pick the fitting database? How do we start our research? Do share with me your thoughts off the top of your head by replying to this email without running a search :)</p><p>Reading eventual consistency and horizontal scalability on demand immediately inclines us towards the NoSQL realm and our minds start to rev thinking of popular NoSQL databases. But before that, we should ideally talk about the type of data we are dealing with and the typical application queries.</p><p>Is the data relational in nature, hierarchical, graph-based, key-value, semi-structured or unstructured? This is key in picking the fitting database to store our data efficiently.</p><p>If the data is relational, then a relational database would fit best to store such type of data and we can further delve into techniques to make it more read-write performant along with fulfilling other requirements.</p><p>Storing data in a relational database does not necessarily mean queries will always be slow due to JOINS and such, and it cannot be scaled. It all depends. It depends on the DB design, application queries, nature of the data, traffic patterns and such.</p><blockquote><p>Every database has its quirks and favors a certain type of data and read-write patterns. Being thorough with the type of data we are dealing with and the queries our service has to deal with helps us dramatically in narrowing down our search.</p><p>Also, this goes without saying that to run an effective technology search, we should be aware of the fundamentals of relational and non-relational databases, their strengths and weaknesses and such, including keeping tabs on the new persistence technologies that are being continually launched.</p></blockquote><p>Taking our discussion further, let's say the data is non-relational, which takes us down the NoSQL road, offering us options such as DynamoDB, Redis, Cassandra, ScyllaDB, CockroachDB, MongoDB and more.</p><p>A good idea would be to go through the specifications of every database and check the benchmarks, the consistency levels they support, how they scale, throughput, replication strategies, how they manage distributed deployments, the support they offer, and so on.</p><p>Additionally, we need to scan the engineering blogs of large-scale services to understand how they are leveraging these products in their services.</p><p>Cloud databases fit best if we need managed solutions as opposed to self-hosting them. Serverless databases are a popular option in that realm. But at the same time, we are vendor-locked in with them. This is a trade-off.</p><p>Once we narrow down on a single or a few databases, the next step is to run a POC (Proof Of Concept). From there, a pilot deployment to a full-scale deployment.</p><p>Only a POC could give us concrete data with observed outcomes if a certain database fits our project requirements, in addition to providing an idea of the deployment costs.</p><div><hr></div><h2>POC (Proof Of Concept)</h2><p>As opposed to picking a tech on how much it shines, we need to run a POC (Proof Of Concept) to verify if it fits our technical requirements. Running a POC enables us to measure performance metrics such as scalability, throughput and resource consumption in addition to gauging the running costs.&nbsp;</p><p>We can figure out potential bottlenecks with a certain technology pretty early in the process, ensuring the tech can handle the expected volume of traffic and transactions. This averts significant time and resource wastage when the same issue is encountered during full-scale implementation.&nbsp;</p><p>A POC gives us concrete data and observed outcomes with which we can make further informed decisions. So, for instance, during the process of picking the right persistence technology for our use case, we can:</p><ul><li><p>Set&nbsp;up the new database in a test environment.</p></li><li><p>Migrate&nbsp;a subset of our current data to the new database.</p></li><li><p>Run&nbsp;typical queries and operations to benchmark performance.</p></li><li><p>Stress-test the tech to see how it handles high loads.</p></li><li><p>Collect&nbsp;feedback from our development and operations teams on ease of use, integration with the existing architecture and such.</p></li></ul><p>Additionally, when picking a new tech, we need to conduct thorough research on the technology, including its documentation, whether it is actively developed, community support, product stability, present real-world deployments, associated licensing, maintenance and training costs, security assessment, etc.&nbsp;</p><h2>Post POC&nbsp;</h2><p>After a successful POC, we could implement the tech in a certain module of our project or a certain service on a small scale in production as opposed to going all in.&nbsp;</p><p>After monitoring the system behavior and performance we can further roll out the full-scale deployment. This approach minimizes the risk.</p><div><hr></div><p>Picking the right database or the fitting technology (any tech, for that matter) entails an understanding of their nuances to avoid a situation where a certain shortcoming or a quirk throws a wrench into the gears in production.</p><p>There is no formula or straightforward answer to every use case. The only answer that fits best is it depends. It depends on the factors we discussed above. Also, there are several use cases where a single database fails to serve all our requirements and we proceed with polyglot persistence, with multiple databases working together.</p><p>I had a detailed discussion with a friend the other day on the same as he was researching the fitting persistence technology for his project. I thought I'd write about it. Furthermore, I've prepared a checklist that will serve as a reference for our future DB research.</p><h2>Database research checklist</h2><h3>Performance requirements</h3><ol><li><p><strong>Throughput</strong></p><ul><li><p><strong>Read throughput</strong>: The number of read operations the database must handle per second.</p></li><li><p><strong>Write throughput</strong>: The number of write operations the database must handle per second.</p></li><li><p><strong>Total Requests per Second (RPS)</strong>: Combined read and write operations the DB must handle per second. This gives a comprehensive picture of the DB load.</p></li></ul></li><li><p><strong>Latency</strong></p><ul><li><p><strong>Maximum acceptable read latency</strong></p></li><li><p><strong>Maximum acceptable write latency</strong></p></li><li><p><strong>Maximum acceptable consistency latency</strong>: The time the data takes to be eventually consistent.</p></li></ul></li></ol><h3>Scalability</h3><ol><li><p><strong>Horizontal scalability</strong></p></li><li><p><strong>Vertical scalability</strong></p></li><li><p><strong>Elasticity: </strong>the<strong> </strong>ability to scale up/down automatically based on the load.</p></li></ol><h3>Availability and Reliability</h3><ol><li><p><strong>High availability</strong></p><ul><li><p>Fault tolerance and redundancy.</p></li><li><p>Multi-region/multi-zone deployment.</p></li></ul></li><li><p><strong>Disaster recovery</strong></p><ul><li><p>Backup and restore capabilities.</p></li><li><p>Data replication and failover mechanisms.</p></li></ul></li><li><p><strong>Durability</strong></p><ul><li><p>Ensuring data is not lost in the case of failures.</p></li></ul></li></ol><h3>Data requirements</h3><ol><li><p><strong>Data volume</strong></p><ul><li><p>Current data size.</p></li><li><p>Expected data growth over time.</p></li></ul></li><li><p><strong>Data model</strong></p><ul><li><p>Structure: Relational/Non-relational (documents, key-value pair, graph, etc.)</p></li></ul></li><li><p><strong>Consistency</strong></p><ul><li><p><strong>Strong consistency</strong></p></li><li><p><strong>Eventual consistency</strong></p></li><li><p><strong>Tunable consistency</strong></p></li></ul></li></ol><blockquote><p>Check out my newsletter post for a detailed read on <a href="https://shivangsnewsletter.com/p/understanding-database-consistency">data consistency</a>.</p></blockquote><h3>Operational Ease</h3><ol><li><p><strong>Management complexity</strong></p><ul><li><p>Ease of setup and configuration.</p></li><li><p>Effective monitoring and maintenance tools available.</p></li><li><p>Administrative overhead and required expertise.</p></li></ul></li><li><p><strong>Support and documentation</strong></p><ul><li><p>Availability of professional support from the vendor.</p></li><li><p>Quality of documentation and community support.</p></li></ul></li><li><p><strong>Ecosystem and integrations</strong></p><ul><li><p>Compatibility with existing tools and applications.</p></li><li><p>Availability of drivers and connectors for various programming languages and frameworks.</p></li></ul></li></ol><h3>Security and Compliance</h3><ol><li><p><strong>Security features</strong></p><ul><li><p>Authentication and authorization mechanisms.</p></li><li><p>Encryption (in transit and at rest).</p></li><li><p>Auditing and logging capabilities.</p></li></ul></li><li><p><strong>Compliance</strong></p><ul><li><p>Adherence to industry standards and regulations (GDPR, HIPAA, etc).</p></li></ul></li></ol><h3>Cost considerations</h3><ol><li><p><strong>Licensing costs</strong></p><ul><li><p>Open-source vs. commercial licensing fees.</p></li><li><p>Enterprise editions and their costs.</p></li></ul></li><li><p><strong>Operational costs</strong></p><ul><li><p>Infrastructure costs (cloud vs. on-premises).</p></li><li><p>Maintenance and administration costs.</p></li></ul></li><li><p><strong>Scalability costs</strong></p><ul><li><p>Cost implications of scaling horizontally or vertically.</p></li><li><p>Auto-scaling costs in cloud environments.</p></li></ul></li></ol><h3>Documentation and Training</h3><ol><li><p><strong>Training Resources</strong></p><ul><li><p>Availability of tutorials, courses and certification programs.</p></li><li><p>Access to a knowledgeable community or user group.</p></li></ul></li><li><p><strong>Case Studies</strong></p><ul><li><p>Similar use cases to ours and success stories.</p></li></ul></li></ol><h3>Performance and Testing</h3><ol><li><p><strong>Benchmarking</strong></p><ul><li><p>Conduct performance tests under load conditions.</p></li><li><p>Compare against expected throughput and latency.</p></li></ul></li><li><p><strong>Proof of Concept (POC)</strong></p><ul><li><p>Implement a small-scale version of your application.</p></li><li><p>Test real-world performance, scalability, and reliability.</p></li></ul></li></ol><h3>Future-Proofing</h3><ol><li><p><strong>Roadmap and Updates</strong></p><ul><li><p>Vendor&#8217;s commitment to future improvements and features.</p></li><li><p>Community activity and upcoming releases.</p></li></ul></li><li><p><strong>Flexibility</strong></p><ul><li><p>Ability to adapt to changing requirements.</p></li><li><p>Ease of migration to other databases if needed.</p></li></ul></li></ol><p>I believe this checklist comprehensively covers the key points that we need to bear in mind when researching the right database for our project.</p><p>If you found this post insightful, <a href="https://backendinsights.com/p/backend7-picking-the-right-database">do share the web link</a> with your network for more reach. You can connect with me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a> &amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>, in addition to replying to this email.</p><blockquote><p>If you wish to learn system architecture from the bare bones, check out the&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to System Architecture Proficiency</a> learning path I've authored that educates you, step by step, on the domain of software architecture, cloud infrastructure and distributed system design.</p></blockquote><p>I'll see you around in my next post. Until then, Cheers!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Coding projects for software developers - Part 2 - Build components and plumb them together]]></title><description><![CDATA[This is part two of my coding projects for developers post series. Do check out part one, which is the introductory part containing eight well-researched industry-relevant coding projects for you. I am also running a Discord server where you can share your coding progress with us in addition to having project development discussions with the community.]]></description><link>https://shivangsnewsletter.com/p/coding-projects-part-2</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/coding-projects-part-2</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Mon, 15 Jul 2024 09:40:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is part two of my coding projects for developers post series. <a href="https://shivangsnewsletter.com/p/coding-projects">Do check out part one</a>, which is the introductory part containing eight well-researched industry-relevant coding projects for you.</p><p>I am also running a <a href="https://discord.gg/a9bbwETTvE">Discord server</a> where you can share your coding progress with us in addition to having project development discussions with the community.</p><p>With that being said, let's get started.</p><h2>Project 9: Code a text editor from scratch (This is an interesting project; read to know why)</h2><p><strong>Tags</strong>: Data structures, String algorithms</p><p><strong>Level</strong>: All</p><p><a href="https://viewsourcecode.org/snaptoken/kilo/">Build your own text editor</a> is an instruction booklet that helps you build your own text editor in C programming language.</p><p>It's about 1000 lines of C in a single file with no dependencies and implements all the basic features you would expect in a minimal text editor, including syntax highlighting and search.</p><p>Here is a <a href="https://flenker.blog/hecto/">reimplementation of the same project in Rust</a>.</p><p>Here is the <a href="https://github.com/rxi/lite">GitHub repo</a> of another text editor written in Lua and C. You'll find the <a href="https://rxi.github.io/lite_an_implementation_overview.html">project doc here</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iRzv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iRzv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png 424w, https://substackcdn.com/image/fetch/$s_!iRzv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png 848w, https://substackcdn.com/image/fetch/$s_!iRzv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png 1272w, https://substackcdn.com/image/fetch/$s_!iRzv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iRzv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png" width="1456" height="908" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:908,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Build a text editor from scratch&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Build a text editor from scratch" title="Build a text editor from scratch" srcset="https://substackcdn.com/image/fetch/$s_!iRzv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png 424w, https://substackcdn.com/image/fetch/$s_!iRzv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png 848w, https://substackcdn.com/image/fetch/$s_!iRzv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png 1272w, https://substackcdn.com/image/fetch/$s_!iRzv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1443816-10f0-4086-a790-eeadbe6e7e56_1643x1025.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/rxi/lite">Image src</a></figcaption></figure></div><h2>Key learnings from this project</h2><p>When you code a text editor from scratch, it will put your data structures and algorithm knowledge (string algorithms primarily) to the test.</p><h3>Common data structures and algorithms that are used to implement a text editor</h3><p>Some of the primary data structures and algorithms that are used to implement a text editor app are:</p><p><strong>Gap buffer</strong></p><p>The gap buffer data structure, which is a dynamic array, is used for efficient insertions and deletions at the cursor position. It maintains a gap, which is a sequence of unused elements within the array; the gap is moved to the cursor position when the user performs an insertion or deletion operation.</p><p>The average time complexity of inserting elements at the current cursor position with this data structure is O(1), which makes it fitting for applications that need to modify data with low latency.</p><p><strong>Rope</strong></p><p>A rope data structure is a binary tree that is leveraged to efficiently store and manipulate large strings. Text operations that require frequent insertions, deletions and concatenations of substrings can be performed more efficiently with minimal delay with ropes compared to traditional array-based strings.</p><p><strong>Piece table</strong></p><p>The piece table DS maintains two buffers: one for the original text and another for the modifications. It keeps track of what parts of the text need to be displayed and in what order without modifying the original text.</p><p>The data structure efficiently manages the undo and redo operations by keeping a history of changes. It also reduces the need for frequent memory reallocations. Only the modified parts of text are stored in the add buffer, which is more memory-efficient than copying the entire text for each change.</p><p><strong>Stack</strong></p><p>Stack can be used to implement the undo and redo operations as well, as it follows the Last In, First Out (LIFO) principle, making it suitable for maintaining the history of operations for undo/redo functionality.</p><p>However, it can become inefficient if the history of actions is extensive or if complex actions need to be reversed. In this scenario, the piece table data structure is more efficient in maintaining and applying a history of changes when handling large texts with a significant number of modifications.</p><p><strong>Linked list</strong></p><p>With linked lists, we can have each line of text as a node, allowing efficient insertion and deletion of lines. Cursor movements can be handled by traversing the list.</p><p>If we need more fine-grained control we can have each node store a character enabling insertions and deletions at any position within the text. Linked lists allow for O(1) insertions and deletions, which makes them fit for a text editor where the content changes frequently.</p><p>Furthermore, this data structure does not have a predefined size, enabling the text editor to handle arbitrarily large texts efficiently.</p><p><strong>Trie</strong></p><p>The trie data structure stores strings as a series of linked nodes and facilitates fast lookup for prefixes. Autocomplete and spell-check features are implemented using tries.</p><p><strong>Suffix tree</strong></p><p>Suffix trie facilitates efficient search for substrings and pattern matching within the text. This structure contains all the suffixes of a given text as their keys and the positions in the text as their values.</p><p><strong>String algorithm</strong></p><p>Several string-based search algorithms are leveraged to manage text in editors, such as Knuth-Morris-Pratt (KMP), Boyer-Moore, Rabin-Karp algorithm, etc.</p><p>KMP algorithm is efficient for searching substrings and pattern matching in large texts. Boyer-Moore does a similar thing by leveraging heuristics.</p><p>Greedy algorithm and dynamic programming can be leveraged for implementing line breaks at appropriate positions in the editor to fit within a given width.</p><blockquote><p><strong>Related read</strong>: <a href="https://en.wikipedia.org/wiki/String-searching_algorithm">String search algorithms</a></p></blockquote><p>To further delve into the details of text editing, check out this book, '<a href="https://www.finseth.com/craft/">The Craft of Text Editing</a>'.&nbsp;&nbsp;</p><blockquote><p>If you implement this project, you'll have good hands-on practice of your data structures and algorithms. In addition, you'll learn many backend engineering concepts that I'll discuss in the subsequent project. </p><p>Also, you may not have to implement all the standard text editor features on the first go. You can pick the important ones and code them.</p></blockquote><p>Once you have the text editor ready, let's put it online as a real-time concurrent text editing service like Google Docs.</p><div><hr></div><h2>Project 10: Building a real-time collaboration text editor like Google Docs</h2><p><strong>Tags</strong>: System design, Web, Full stack</p><p><strong>Level</strong>: Intermediate, Advanced</p><p>Regarding building a web service like Google Docs, here are a few resources that'll help you do that:</p><p><a href="https://www.freecodecamp.org/news/build-a-google-docs-clone-with-react-and-firebase/">Freecodecamp post on building a Google Docs clone</a> with React, Firebase and Material UI.</p><p>This <a href="https://blog.logrocket.com/build-google-doc-clone-html-css-javascript/">LogRocket article</a> builds the same with plain JavaScript and Firebase on the backend.</p><p>And <a href="https://www.youtube.com/watch?v=iRaelG7v0OU">this video tutorial</a> does the same with React, Socket.io, and MongoDB.</p><p>The above resources will help you implement your web service. You may or may not use the same tech for your project. Pick the tech you are most comfortable with and fits the use case.</p><p>Besides coding the web service, you would need to know a few system design concepts like CRDTs and operational transformation to understand how conflicted data (when multiple users are modifying the same content) is dealt with in real-time collaboration services.</p><h3>CRDT (Conflict-free Replicated Data Types)</h3><p>A conflict-free replicated data type is a data structure that is replicated across multiple nodes in a distributed system and enables concurrent updates to a resource without coordinating with other nodes in real-time. <a href="https://shivangsnewsletter.com/p/understanding-database-consistency">The concurrent updates are merged eventually</a>.</p><p>A conflict resolution algorithm that is a part of this data type automatically resolves the inconsistencies that might occur. However, saying this doesn't make resolving concurrent conflicts any easier. While implementing CRDTs, we need to design conflict-resolving logic and test the scenarios thoroughly.&nbsp;</p><p>An overly simple example of CRDT is having a boolean counter to track the occurrence of an event in a distributed system. If that event occurs on any of the nodes in the system, the flag is set to true. When the writes are merged eventually (with a node having the flag as true and others as false), the final resolved result is true: the event has occurred.</p><p>As the name implies (conflict-free replicated data types), different data types are leveraged in CRDTs like the stack, queue, list, counter, etc., equipped with different techniques to resolve conflicts efficiently.</p><h3>Redis CRDTs</h3><p>Redis uses CRDTs to automatically handle conflicting concurrent reads and writes across multiple Redis nodes spread across the globe. CRDTs are implemented using a global database spanning multiple clusters.</p><p>These databases are called Conflict-free Replicated Databases or CRDBs.</p><p>CRDBs are local databases in every cluster, behaving just like a regular Redis database, establishing replication between each other, facilitated by a process called CRDB Syncer.</p><p>CRDBs resolve conflicts in concurrent write operations based on defined rules per data type. The CRDB Syncer communicates the writes to nodes in a streaming fashion, compressing the data in motion. The sync process is intelligent enough to handle the interruptions due to network partitions and other reasons.</p><p>If you are intrigued and want to read more on how Redis handles conflicting writes based on the data type, what are the rules involved per data type? Check out <a href="https://redis.io/blog/diving-into-crdts/">this</a> and <a href="https://redis.io/active-active/">this</a> resource.</p><blockquote><p><strong>Related read</strong>: Riot Games&nbsp;<a href="https://technology.riotgames.com/news/chat-service-architecture-persistence">leverages CRDTs</a> to implement the in-game chat system of their game League of Legends.</p></blockquote><h3>Operational Transformation</h3><p>Operational transformation is a technology or a technique primarily used in resolving conflicts in collaborative systems, such as online text editors like Google Docs, where multiple users work on the same content concurrently.&nbsp;</p><h3>How Google Docs leverages operational transformation to resolve concurrent data conflicts</h3><p>Every user has their own local copy of data to work on in a lock-free, non-blocking manner. The changes are asynchronously propagated across the system delivering all the users a consistent view. To achieve a consistent view, the data is transformed before being displayed to the other users.&nbsp;</p><p>Here is how:</p><p>Imagine two users, A and B, working on the same text concurrently. Consider a string "<em>abc</em>". User A adds a character<em> z</em> at position 0. For him, the string will now be "<em>zabc</em>"; the same is updated on the server.</p><p>The string for user B is still "<em>abc</em>" for now and she deletes the character&nbsp;<em>c</em> at position 2. On the server, the revised string after deleting the character at position 2 will be "<em>zac</em>" which is incorrect since the deleted character was&nbsp;<em>c,</em> not&nbsp;<em>b</em>. The correct string should be "<em>zab</em>"</p><p>To avoid these inconsistencies, the changes before being replicated across are transformed operationally or, in simple words, passed through a function with specified rules to achieve the correct effect.</p><p>So, the change made by user B before being replicated will be passed through an OT function for the correct effect. Once passed, the string "<em>zab</em>" will be updated on the screens of all the users working on it concurrently.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-085!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-085!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-085!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-085!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-085!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-085!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg" width="1148" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:1148,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;CRDTs Operational transformation Google docs&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="CRDTs Operational transformation Google docs" title="CRDTs Operational transformation Google docs" srcset="https://substackcdn.com/image/fetch/$s_!-085!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-085!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-085!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-085!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa138a25d-6e26-4821-934f-031f05422192_1148x540.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Several operational transformation algorithms have been designed with different capabilities and consistency models for different applications.</p><h2>MongoDB Operational Transformation&nbsp;</h2><p>MongoDB Atlas app service&nbsp;<a href="https://www.mongodb.com/docs/atlas/app-services/sync/learn/conflict-resolution/">leverages operational transformation</a>&nbsp;to resolve conflicts with predefined conflict resolution rules.&nbsp;</p><p>Taking the example from the documentation:</p><p>Matt and Sarah are working on data for their dog-walking business. Matt deletes data on one of their client's dogs, Doug, as they no longer need to walk him. While Sarah is out without an internet connection, she edits Doug's required walk time data on her local, offline version, as she does not know about Matt's deletion of Doug's data.</p><p>Once Sarah regains the internet connection, her change is sent to the server. As deletes always win according to App Services' conflict resolution rules, Matt's deletion is kept by the server as opposed to Sarah's edit. The server will not send Sarah's edits to Matt's device. After the transformation, the data is again in agreement across Matt and Sarah's devices.&nbsp;</p><blockquote><p>Here are a few good reads on CRDTs for further exploration:</p><p><a href="https://www.cs.cmu.edu/~csd-phd-blog/2023/collaborative-data-design/">Designing data structures for collaborative apps</a> </p><p><a href="https://digitalfreepen.com/2017/10/06/simple-real-time-collaborative-text-editor.html">A simple approach to building a real-time collaborative text editor</a> </p></blockquote><p><a href="https://docs.yjs.dev/">Yjs is a high-performance CRDT</a> for building collaborative applications that sync automatically. Many services use it to implement a real-time collaboration feature. Do check out the docs for the implementation insights.</p><p>Furthermore, run a Google search for, 'How to build a real-time collaborative app using CRDTs' you'll get a bunch of tutorials.</p><div><hr></div><h2>A case study on building a real-time collaborative text editor for the browser from scratch in JavaScript</h2><p><a href="https://conclave.tech/">Conclave</a> is an open-source, real-time, collaborative text editor for the browser built from scratch in JavaScript.</p><p>It uses CRDTs to make the users stay in sync and WebRTC to enable users to send messages directly to one another over a decentralized peer-to-peer network.</p><p>Here is a <a href="https://conclave-team.github.io/conclave-site/">detailed case study</a> on how they built it, the backend architecture, etc. It's a pretty insightful read.</p><h2>Plumbing everything together</h2><p>If you code the above two projects clubbing the components together, it will provide you with a solid hands-on learning experience. The resources that I have shared have implementation in varied technologies, in addition to having discussions on system design concepts as well.</p><p>The right way to implement your project would be to go through the resources, understand the concepts and implement your code from scratch in any programming language and technology stack of your choice. If you understand the fundamentals and the associated concepts, implementing the project won't be a problem for you.</p><p>Moreover, if you have any issues with the implementation, we have our <a href="https://discord.gg/a9bbwETTvE">Discord server</a> where you can discuss things with us.</p><p>Also, this is a recommended read: <a href="https://shivangsnewsletter.com/p/understanding-large-codebases">How to wrap our heads around large codebases and open-source GitHub repositories</a></p><blockquote><p>Furthermore, if you wish to master web architecture and system design starting from zero, check out my&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency learning path</a>, comprising three courses that go through all the associated concepts in an easy-to-understand language. The courses educate you, step by step, on the domain of web architecture, cloud infrastructure and distributed system design.</p></blockquote><p>I am also implementing a distributed system from the bare bones. Here is <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">part one of the series</a> of newsletter posts that I'll be publishing on that.</p><blockquote><p>Additionally, if you want to start your journey of implementing a distributed system from the bare bones, check out <a href="https://bit.ly/3swSHHl">CodeCrafters</a> (Affiliate). It is a platform that helps us code systems like Redis, Docker, Git, a DNS server, and more step-by-step from the bare bones in the programming language of our choice. </p><p>It's designed to help developers learn how to build complex systems from the ground up, focusing on teaching the internals of distributed systems and other related technologies through hands-on, project-based learning. </p><p>Each project is broken down into stages, with each stage focusing on implementing a specific feature or component of the system. Do check it out and kickstart your hands-on systems programming learning. If you decide to make a purchase, you can use&nbsp;<a href="https://bit.ly/3QL4TN0">my unique link to get 40% off</a>.</p></blockquote><p>More projects are coming soon to this list. If you wish to be notified of my future posts, including the new project additions to this list, do&nbsp;<a href="https://shivangsnewsletter.com/">subscribe to this newsletter</a> (if you haven&#8217;t yet). </p><p>Also, if you found this post insightful, <a href="https://shivangsnewsletter.com/p/coding-projects-part-2">please do share this web URL</a> with your network more reach.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a>&nbsp;as well. I'll see you in the next post. Until then, Cheers!</p>]]></content:encoded></item><item><title><![CDATA[How to wrap our heads around large codebases and open-source GitHub repositories]]></title><description><![CDATA[When we face an unfamiliar large codebase either at our workplace or an open-source GitHub repository, our first thought is, 'How do I understand the high-level architecture of this project? I need to understand the inside out to learn and be able to contribute to it.']]></description><link>https://shivangsnewsletter.com/p/understanding-large-codebases</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/understanding-large-codebases</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Fri, 12 Jul 2024 06:34:36 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f5fe8970-9cb5-466f-aceb-84815031550d_1852x920.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When we face an unfamiliar large codebase either at our workplace or an open-source GitHub repository, our first thought is, 'How do I understand the high-level architecture of this project? I need to understand the inside out to learn and be able to contribute to it.'</p><p>We then click through the endless code files and are unavoidably overwhelmed. Trying to understand the high-level project architecture and business use cases encompassing several different complex system modules via code is the wrong move. It's a surefire way to get overwhelmed.</p><p>The advisable way is to go through the project documentation, including the design docs (if you have them available), to understand the high-level architecture, including various flows and business use cases.</p><p>Let's understand this better with the help of an example. Say, I browse through the <a href="https://github.com/elastic/elasticsearch">GitHub repo of Elasticsearch</a>. Elasticsearch is the defacto distributed search and analytics solution used in enterprise projects. The repo is continually evolving with support for vector searches, RAG, generative AI apps, real-time searches over massive datasets, etc.</p><p>To gain insights into what the project does, including the high-level architecture, we should go through the README file, the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html">project documentation</a>, and the <a href="https://www.elastic.co/blog">blog</a> as opposed to trying to figure these things out via code.</p><p>Project docs provide comprehensive information about the product architecture, different modules, internal specifics, deployment, etc. Blogs ideally contain more related information about the product in addition to simple hands-on examples. This is where we get our foot in the door.</p><p>Wrapping our heads around large codebases and open-source GitHub repositories is something that is not trivial and needs significant time investment.</p><p>Hoping to start contributing to a large codebase with minimal time investment is like walking into an organization that has its codebase been developed for years and pushing code to production on day one.</p><h2>The open-source contribution insanity</h2><p>A surprisingly big number of influencers are misguiding beginners to contribute to open-source to grow their skills and better their resumes and their followers are tripping over this in masses. This is like expecting a baby to climb Mt. Everest.</p><p>And with projects with minimal or no documentation, this is like having to climb without the supplemental oxygen cylinders. What do you expect the outcome to be?</p><p>I have always been advocating against this. Open-source contribution is not for beginners. If you want to grow your skills, develop projects from the bare bones. Know thy fundamentals; be good with low-level and high-level design patterns, system architecture, databases and everything that you would need to build a product from the bare bones.</p><blockquote><p>I have aggregated a bunch of <a href="https://shivangsnewsletter.com/p/coding-projects">industry-relevant coding projects</a> to help you get more hands-on practice and improve your development skills. Check those out.</p></blockquote><blockquote><p>If you want to master web architecture and system design, check out my&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency learning path</a>, comprising three courses that go through all the concepts starting from zero in an easy-to-understand language. The courses educate you, step by step, on the domain of web/software architecture, cloud infrastructure and distributed system design.</p></blockquote><p>Do not get caught up with the idea of contributing to the open-source right at the beginning of your career. Be deft with writing code, build scalable services from scratch and then maybe, if you feel like it, delve into open-source contribution.</p><p>Moreover, even if you want to write open-source code as a beginner as opposed to grappling with a mature, complex codebase, pick a proprietary product and build an open-source version of it from scratch. You'll learn dramatically more in contrast to contributing a few lines of code to an existing repo.</p><div><hr></div><p>We were discussing that to understand the high-level architecture of a repository, we should look into the project docs.</p><h2>What if there is no documentation?</h2><p>If the documentation doesn't exist, then it is going to be an uphill battle. It will be like getting into a battle royale mode where we have to scavenge for resources after being dropped into an arena.</p><p>Check the README file, go through the commit history, try to get in touch with the project collaborators and core devs to get insights into the project.</p><h2>Getting our hands dirty</h2><p>The next step is to check the test cases to understand the business logic and other code functionality. If the functions are well-documented, that would help a lot in putting the pieces together.</p><p>Deploy the code in your local machine and start debugging. We may not have to understand every nook and cranny of the codebase rather can focus on specific flows.</p><p>If the project has a UI, for instance, if it's a web-based application, check the endpoints different sections of a webpage are hitting. Find those endpoints on the backend, put debuggers on it, and navigate through the code to figure out the flow.</p><p>This is the best way to understand large codebases. After enough debugging, you'll be able to make sense of larger parts of the codebase.</p><h2>Feature development and production support</h2><p>We prefer product development roles over production support jobs that largely involve bug fixing and site reliability tasks. However, debugging and fixing varied bugs across the system provides us with a deep knowledge of system design, which may not be possible during feature development, where we are focused on a specific part of the system.</p><p>Furthermore, while fixing bugs, we get exposed to different technologies used in the project, collaborate with different teams like testing, ops, etc., and delve into things like system observability, efficiency, scanning and getting rid of bottlenecks, infrastructure scaling and related things.</p><p>Via this, we develop critical problem-solving skills from a system and infrastructure standpoint. Regularly fixing varied bugs hones our troubleshooting and diagnostic skills, making us adept at identifying and resolving issues quickly.</p><blockquote><p>I've written a detailed post on <a href="https://shivangsnewsletter.com/p/observability-in-distributed-systems">system observability</a> in case you want to read it.</p></blockquote><p>When we deploy a project in our local machine and debug through the code, we get a similar experience as that of when working as a developer on prod support. Debugging helps us understand the code flow and how different classes and functions interact via design patterns and such and enhances our knowledge of low-level code design.</p><p>As a beginner, you can learn from open-source, but don't be hasty about contributing to it just for the heck of it. Instead, <a href="https://shivangsnewsletter.com/p/coding-projects">build your own projects from scratch</a>.</p><p>I am saying this from experience, as in my career, I've developed products and features from the bare bones, in addition to working as a developer on support on massive codebases, and I have learned immensely, having gotten the best of both worlds.</p><h2>Can we leverage LLMs to understand a GitHub repo code?</h2><p>I ran an extensive search on if we can leverage an AI tool that would scan through a codebase and delineate the information for easy understanding.</p><p>I came across a few tools, and almost all have a paywall, so I couldn't check how effective they were. Rightly so, scanning through a large repo and figuring things out would require significant compute. Also, regarding scanning third-party code via AI, we need to first check the respective repository license we intend to scan. Do go through their usage rights and respect them.</p><p>Also, my experience with AI is don't take everything it says at face value. It is continually making subtle mistakes that are harder to catch, especially if you have less experience on the topic you are taking AI's advice on. Cross-verify when you have doubts.</p><p>Know your domain.</p><p>If you found this post helpful, consider sharing it with your friends for more reach. If you are reading the&nbsp;<a href="https://shivangsnewsletter.com/">web version of this post</a>, consider subscribing to get my posts delivered to your inbox as soon as they are published.</p><p>Check out the list of <a href="https://shivangsnewsletter.com/p/coding-projects">industry-relevant coding projects</a> you can do on the side for more hands-on practice.</p><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a> &amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a> and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a> as well. I'll see you in the next post. Until then, Cheers!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Coding projects for software developers: Let’s get some hands-on practice – Part 1]]></title><description><![CDATA[Hello! Below is an aggregated list of software projects I am putting together that we can code over the weekend or over a span of a few days on the side. This will help us become better developers by having continual hands-on practice working with established and new technologies in addition to learning new concepts.]]></description><link>https://shivangsnewsletter.com/p/coding-projects</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/coding-projects</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Wed, 03 Jul 2024 04:46:39 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/496cfb13-700e-411b-ac09-7f0103aad2bb_1920x1080.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello! Below is an aggregated list of software projects I am putting together that we can code over the weekend or over a span of a few days on the side. This will help us become better developers by having continual hands-on practice working with established and new technologies in addition to learning new concepts.</p><p>We are often on the lookout for some good hands-on projects or real-world coding exercises that we can do on the side to enhance our skills, learn new tech and concepts, in addition to having a revision of what we already know. A hands-on approach to learning not only augments our confidence but also expedites our career growth.</p><p>Moreover, there is a real scarcity of resources that help us find good side projects to work on. Furthermore, if you run a search for software projects, you&#8217;ll find outdated resources suggesting things like coding a library management system, a blogging system, a calculator app, a task management system, a restaurant management system and so on. You get the idea.</p><p>Honestly, reading these projects SUCK THE SOUL OUT OF ME. They are dangerously dull and are suitable for learning basic OOP, not much than that. Hence, this handpicked,  industry-relevant well-researched list that I am putting together ensuring the projects are for all skill levels (from beginner to intermediate to advanced). Additionally, I&#8217;ve ensured these projects and the tasks they contain are micro or moderately sized so as not to overwhelm us.</p><blockquote><p>This is an ongoing list where projects will be continually added. If you find it helpful, I urge you to do two things:<br><br>1. <a href="https://shivangsnewsletter.com/p/coding-projects">Bookmark this page and share it with your network</a> so that most of us can take advantage of it. If you are sharing this on your social handles, do tag me on <a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a> or <a href="https://x.com/shivang_z">X</a> (wherever you are) so that I get notified.<br><br>I&#8217;ve created a <a href="https://discord.gg/VV2j4RDQ">Discord server</a> to get together and discuss our progress on weekend projects, staying accountable, including having discussions on industry trends and career growth. Join this as well; kickstart your next project and share your progress with us. <br><br>Sharing this list with your network and your progress would mean a lot and keep me motivated to extend this list actively.<br><br>2. If you have any ideas on new fun projects that can be added to the list, do connect with me on the above-stated social networks or send me an email to <em>contact@scaleyourapp.com</em></p></blockquote><p>With this being said, let&#8217;s get on with it.</p><h2>Project 1: Analyzing railway traffic in the Netherlands with DuckDB</h2><p><strong>Tags</strong>: Database, Data Analytics, SQL<br><strong>Level</strong>: Beginner, Intermediate</p><p>DuckDB&nbsp;is an open-source in-memory relational table-oriented OLAP (Online Analytical Processing) database focusing on low-latency in-memory data processing and analytics use cases.</p><p>They <a href="https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands.html">analyzed a real-world open Dutch railway dataset</a> with DuckDB to showcase some of the product&#8217;s features.</p><p>The core functionality of the project includes running queries to find the busiest station per month (fun fact: it&#8217;s not Amsterdam :)), the top three busiest stations for each summer month, the largest distance between train stations in the Netherlands, querying remote data files, etc.</p><p>This is a good weekend project for those who want to get their hands dirty running SQL queries with an in-memory analytical database. Moreover, the <a href="https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands.html">linked resource</a> guides you on how to do it so you won&#8217;t feel lost or left in the dark.</p><h3><strong>Key learnings from this hands-on exercise</strong></h3><p>You&#8217;ll learn:</p><ul><li><p>To run analytical SQL queries on a real-world dataset.</p></li><li><p>To run SQL queries on remote datasets over HTTP and the <a href="https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html">S3 API</a>.</p></li><li><p>About in-memory databases, processes and threads and the related concepts. I&#8217;ve discussed <a href="https://scaleyourapp.com/in-memory/">in-memory databases on my blog here</a>. Do give it a read.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x-E2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x-E2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png 424w, https://substackcdn.com/image/fetch/$s_!x-E2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png 848w, https://substackcdn.com/image/fetch/$s_!x-E2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png 1272w, https://substackcdn.com/image/fetch/$s_!x-E2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x-E2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png" width="1024" height="545" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:545,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;In-memory database&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="In-memory database" title="In-memory database" srcset="https://substackcdn.com/image/fetch/$s_!x-E2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png 424w, https://substackcdn.com/image/fetch/$s_!x-E2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png 848w, https://substackcdn.com/image/fetch/$s_!x-E2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png 1272w, https://substackcdn.com/image/fetch/$s_!x-E2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a2ebe16-659c-4a77-a80e-9aa3e38815f5_1024x545.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>DuckDB does not run as a separate process but is totally embedded in the host process. This means it is designed to operate within the memory space and threads as the application that uses it as opposed to running as a separate independent DB process. This means no need for inter-process communication, which reduces complexity and provides better performance.</p><p>Since the database focuses on OLAP use cases, it leverages a columnar-vectorized query execution engine where a large batch of values, aka vectors, are processed in one operation as opposed to sequentially. This reduces the operational overhead significantly, requiring fewer CPU cycles.</p><blockquote><p>You&#8217;ll find a discussion on <a href="https://scaleyourapp.com/wide-column-and-column-oriented-databases/">OLAP wide-column databases on my blog here</a>.</p></blockquote><p>Traditional OLTP (Online Transaction Processing) databases like MySQL, SQLite or PostgreSQL process each row sequentially, thus requiring more compute.</p><p>OLAP use cases involve complex queries processing large data volumes and thus, performance becomes critical for them to keep the latency low.</p><p>So, for instance, if we need to compute the sum of a column of integers, vectorized execution will load a chunk of integers into memory to perform the sum operation as opposed to processing each integer one by one.</p><p>This reduces the number of function calls and loop iterations, which are relatively expensive. Modern CPUs are optimized for handling operations on batches of data. Vectorized execution leverages CPU cache, SIMD (Single Instruction, Multiple Data) instructions, and CPU parallelism to achieve performance.</p><p>Modern databases like CockroachDB and ClickhouseDB leverage the same vectorized execution approach for performance.</p><p>If you delve deeper into the <a href="https://duckdb.org/docs/index">database docs</a> and my articles that I&#8217;ve linked above, you&#8217;ll learn all these backend engineering, distributed systems and system design concepts as well, which I believe will augment your knowledge dramatically.</p><p>Furthermore, since the <a href="https://www.rijdendetreinen.nl/en/open-data">dataset is open</a> you can run the same coding exercise with any OLAP database of your choice.</p><div><hr></div><h2><strong>Project 2: A web service managing real-time train running information</strong></h2><p><strong>Tags</strong>: Backend, Web Service, REST API, Testing, Observability, Deployment, Go, Prometheus, Docker<br><strong>Level</strong>: Intermediate, Advanced</p><p><a href="https://www.rijdendetreinen.nl/en">Rijden de Treinen</a> runs a backend service called GoTrain implemented in Go to receive, process and distribute real-time data about train services in the Netherlands.</p><p>The service is currently used in production accessible via the website and as a mobile app. The backend service <a href="https://scaleyourapp.com/what-is-data-ingestion-how-to-pick-the-right-data-ingestion-tool/">ingests data streams</a> containing info on the statuses of running trains, processes it, saves it in-memory and provides it to third parties via a REST API.</p><p>Through the REST API, clients can request a summary of all departing trains for a single station, information on the departing train, upcoming arrivals and so on.</p><p>You&#8217;ll find the details of this backend service on its <a href="https://github.com/rijdendetreinen/gotrain">GitHub repo</a>. The project further integrates Prometheus for observability.</p><blockquote><p>I&#8217;ve delved deep into <a href="https://shivangsnewsletter.com/p/observability-in-distributed-systems">observability here</a>. Also, do go through <a href="https://scaleyourapp.com/what-is-grafana-why-use-it-everything-you-should-know-about-it/">this post to understand how Prometheus</a> fits into the picture.</p></blockquote><h3><strong>Key learnings from this project</strong></h3><p>The service ingests, processes and stores real-time data streams. By working on this project, you&#8217;ll understand this web service architecture and how data is ingested in real-time and processed in-memory for efficiency.</p><blockquote><p>To go through the fundamentals of <a href="https://scaleyourapp.com/web-application-architecture-explained/">web architecture</a> and <a href="https://scaleyourapp.com/application-architecture/">application architecture</a>, check out the linked posts that I&#8217;ve written on my blog.</p></blockquote><p>Additionally, this <a href="https://scaleyourapp.com/in-memory/">in-memory processing article</a> will help you understand how in-memory processing reduces latency. And since in this app the information dealt with is real-time, low latency is crucial.</p><p>You&#8217;ll also get insights into REST API implementation, structuring API endpoints, serving dynamic information to clients and integration with external data sources.</p><p>Furthermore, the repo also talks about data archiving for future analysis, queuing for asynchronous processing, containerization and setting up observability. I&#8217;ve linked the observability articles above for you to understand the fundamentals.</p><p>Containerization will help you understand deployment, scalability and the cloud-native architecture.</p><blockquote><p>If you are a beginner, you can skip these additional components and focus on coding the data ingestion, in-memory processing and the REST API part. Once you are done with these, the rest of the things can be implemented at a later point.</p></blockquote><p>At the point of mentioning this project here, the repo looks forward to increasing the test coverage. You can work on test cases as well. Additionally, you can also build custom client dashboards by extending the REST API.</p><blockquote><p>Though the project is written in Go, it does not stop us from implementing the service in the programming language and the tech stack of our choice. It will be one hell of a learning experience implementing things from the bare bones. You&#8217;ll have excellent hands-on experience implementing a data-driven scalable application.<br><br>We can discuss things further in the <a href="https://discord.gg/VV2j4RDQ">Discord group</a>.</p></blockquote><div><hr></div><h2><strong>Project 3: Build a family cash card application</strong></h2><p><strong>Tags</strong>: Backend, Web Service, REST API, Testing, Java, Spring Boot, Spring Data</p><p><strong>Level</strong>: Beginner</p><p><a href="https://spring.academy/courses/building-a-rest-api-with-spring-boot/lessons/introduction">Building a simple family cash card application</a> is a project facilitated by the Spring Academy that teaches us to build a REST API from the bare bones with interactive, hands-on exercises.</p><p>The app allows parents to manage allowances in the form of digital debit cards for their kids. It gives them ease and control over managing funds for their children.</p><h3><strong>Key learnings from this project</strong></h3><p>By coding this project, you&#8217;ll learn:</p><ul><li><p>To implement a REST API</p></li><li><p>Fundamentals of API design and implementing API endpoints</p></li><li><p>To make the app secure by implementing authentication and authorization</p></li><li><p>Test-driven development</p></li><li><p>Persisting application data leveraging different application layers like the Controller and Repository</p></li><li><p>Software development principles like separation of concerns, loose coupling, etc.</p></li><li><p>Spring Boot fundamentals. You&#8217;ll be able to leverage the framework to implement other real-world projects.</p></li></ul><h2><strong>Layered architecture: Application layers</strong></h2><p>In most enterprise projects, you&#8217;ll find code split up into layers like the controller, service and data access layers. We can always add more layers to our code based on the requirements and the complexity of the project.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xqUJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xqUJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xqUJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xqUJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xqUJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xqUJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg" width="1024" height="613" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:613,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Layered architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Layered architecture" title="Layered architecture" srcset="https://substackcdn.com/image/fetch/$s_!xqUJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xqUJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xqUJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xqUJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32bf493a-fa21-41f2-82c6-0960e3ac611d_1024x613.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the layered architecture, every layer has its specific role; for instance, the controllers will handle requests specific to a certain business feature or domain, the service layer will execute the business logic, the data access layer will communicate with the database and so on. These layers communicate with each other via interfaces to keep things loosely coupled and abstracted.</p><p>With this, specific layers, such as the service layer, wouldn&#8217;t worry about what is going on in the controller or the DAO layer; It just does its job, that is, executing business logic and passing the data across the DAO and the controller. Having a layered architecture helps implement the&nbsp;separation of concerns&nbsp;design principle.&nbsp;</p><p>With this architecture, a change in a certain layer of the code won&#8217;t impact other layers much. The layers are isolated. This facilitates easy development and testing, keeping the code maintainable and extendable.</p><p>Well, the project is in Spring Java; however, you can also implement it in the programming language and the web framework of your choice. It is important to understand the code flow, application architecture, and system design. Once these are clear, you shall be able to code the application in any language of your choice.</p><blockquote><p>Check out this post on <a href="https://scaleyourapp.com/master-system-design-for-your-interviews/">mastering system design for your interviews</a> written by me on my blog.</p></blockquote><div><hr></div><h2><strong>Project 4: Build a batch application that generates billing reports for a cell phone company</strong></h2><p><strong>Tags</strong>: Backend, Web Service, Java, Spring Boot, Spring Batch</p><p><strong>Level</strong>: Beginner, Intermediate</p><p><a href="https://spring.academy/courses/building-a-batch-application-with-spring-batch">Building a batch application with Spring batch</a> is a hands-on project by Spring Academy that helps you code a robust fault-tolerant batch application that generates billing reports for a fictional cell phone company.</p><p>You&#8217;ll implement a batch service called the Billing Job as below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uSNH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uSNH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp 424w, https://substackcdn.com/image/fetch/$s_!uSNH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp 848w, https://substackcdn.com/image/fetch/$s_!uSNH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp 1272w, https://substackcdn.com/image/fetch/$s_!uSNH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uSNH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp" width="950" height="535" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:535,&quot;width&quot;:950,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Batch processing application&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Batch processing application" title="Batch processing application" srcset="https://substackcdn.com/image/fetch/$s_!uSNH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp 424w, https://substackcdn.com/image/fetch/$s_!uSNH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp 848w, https://substackcdn.com/image/fetch/$s_!uSNH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp 1272w, https://substackcdn.com/image/fetch/$s_!uSNH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a3cd8d6-d82c-49b7-9c94-52ff59720938_950x535.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Img src: <a href="https://spring.academy/courses/building-a-batch-application-with-spring-batch/lessons/introduction">Spring Academy</a></figcaption></figure></div><p>The file preparation stage copies the file containing the monthly usage for the cell phone company customers from the file server to the staging server.</p><p>The ingestion component ingests the files into a relational database that holds the data that is used to generate the billing reports.</p><p>The report generation component processes the billing information from the database and creates a flat file for the customers.</p><h3><strong>Key learnings from this project</strong></h3><p>You&#8217;ll learn:</p><ul><li><p>The fundamental concepts of batch processing in web services</p></li><li><p>Spring Batch and Spring Boot fundamentals</p></li><li><p>Application architecture involving the batch processing module</p></li><li><p>The importance of workflows in system design</p></li><li><p>To create, run and test fault-tolerant batch jobs</p></li></ul><h2><strong>Batch processing</strong></h2><p>Enterprise systems extensively leverage batch processing to run automated processes on the backend to fulfill their day-to-day business requirements.</p><p>Some of the core examples of this are billing and payroll analysis, medical bills and other information-processing tasks, data backups, report generation, transaction processing, sales inventory updates, etc.</p><p>Building a batch processing application from the bare bones will give you an excellent insight into the functionality of such systems and an edge as a backend developer.</p><div><hr></div><h2><strong>Project 5: Build a Hackernews clone backed by a GraphQL API</strong></h2><p><strong>Tags</strong>: Backend, Web Service, API, Frontend</p><p><strong>Level</strong>: Beginner</p><p>This <a href="https://www.howtographql.com/">free and open-source tutorial</a> helps us create a GraphQL app from zero to production. The project entails building a Hackernews clone from the bare bones, following best practices, and using the programming language and framework of your choice.</p><h3><strong>Key learnings from this project</strong></h3><p>You&#8217;ll learn the fundamentals of GraphQL and build a full-stack application from the bare bones.</p><div><hr></div><h2><strong>Project 6: Build an SQL-based algorithmic trading system with Redpanda and Apache Flink</strong></h2><p><strong>Tags</strong>: Backend, Data Processing, Data Streaming, Flink, Redpanda, SQL, Python</p><p><strong>Level</strong>: Intermediate, Advanced&nbsp;&nbsp;</p><p>Redpanda is an open-source data streaming platform like Kafka. It provides a comprehensive guide on how to <a href="https://university.redpanda.com/courses/use-cases-algorithmic-trading">build an SQL-based algorithmic trading system</a> with Redpanda, Apache Flink and some finance APIs.</p><p>The application or the service enables the end users to automatically make investment decisions using market data, including executing trade programmatically.</p><h3>Key learnings from this project</h3><p>You&#8217;ll learn:</p><ul><li><p>Data and stream processing fundamentals, including an architectural pattern called event sourcing</p></li><li><p>Integrating your code with external APIs</p></li><li><p>Building fast low-latency systems capable of quickly reacting to market events</p></li><li><p>Building Apache Flink applications leveraging Redpanda</p></li><li><p>An emerging data streaming technology</p></li></ul><p>Being aware of the fundamentals of Redpanda is a prerequisite to this course. You can take a fundamentals hands-on course on <a href="https://university.redpanda.com/courses/hands-on-redpanda-getting-started">getting started with Redpanda here</a>. Furthermore, the project does not demand prior knowledge of Flink or algorithmic trading.</p><h2><strong>Redpanda</strong></h2><p><a href="https://redpanda.com/">Redpanda</a> is an open-source data streaming platform built from the ground up in C++ for performance.</p><p>As per the <a href="https://redpanda.com/blog/redpanda-vs-kafka-performance-benchmark">benchmarks</a>, Redpanda delivers at least 10x faster tail latencies than Kafka and uses up to 3x fewer nodes to do so.</p><p>The platform uses a thread-per-core architecture leveraging the Seastar framework to ensure high throughput. ScyllaDB, written in C++, also leverages the Seastar framework to be highly asynchronous with a shared-nothing design. It is optimized for&nbsp;<a href="https://scaleyourapp.com/parallel-processing/">modern cloud multiprocessor multicore NUMA cloud hardware</a>&nbsp;to run millions of operations per second at sub-millisecond average latencies.</p><blockquote><p>I&#8217;ve discussed the <a href="https://scaleyourapp.com/database-architecture-part-two/">ScyllaDB shard per code architecture on my blog here</a>. Do give it a read if you wish to delve into the details.</p></blockquote><p>Redpanda, being a data streaming platform, fits best with the algorithm trading system as it enables us to continuously process data streams obtained from the finance APIs.</p><h2><strong>Apache Flink</strong></h2><p><a href="https://flink.apache.org/">Apache Flink</a> is an open-source stream processing framework with high-throughput, scalable data processing capabilities, supporting both batch and real-time data processing. It is leveraged by a bunch of big guns in the industry.</p><p>Flink supports a wide range of use cases like computing the average price of stocks using a sliding time window or something simple, like a 1:1 transformation of a single data point.</p><p>Algorithmic trading applications make heavy use of time-series analysis, and the framework&#8217;s windowing capabilities are leveraged to implement the project features. Furthermore, Flink&#8217;s state checkpointing and recovery features prove to be pretty helpful as well in coding such use cases.</p><h2><strong>Event sourcing</strong></h2><p>When implementing systems dealing with time-series data as opposed to just computing the current data snapshot, the entire history of events is processed to achieve an outcome.</p><p>The time-series data helps in understanding how the market changes over time, different trading strategies by replaying respective events, and so on.</p><p>These requirements are addressed by an architectural pattern called the event sourcing pattern. <a href="https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing">Event sourcing</a> is a pattern where the changes that occur over a period of time are stored immutably as events in an append-only log. This log acts as a record, providing more comprehensive insights into the system and when replayed, helps us reproduce the past system state deterministically.</p><p>The technologies used in the project above are open-source and if you implement it from scratch it will be a pretty good hands-on practice for you.</p><div><hr></div><h2><strong>Project 7: Build a low-latency video streaming app with ScyllaDB &amp; NextJS</strong></h2><p><strong>Tags</strong>: Backend, Web Service, Cloud, ScyllaDB, NextJS, TypeScript</p><p><strong>Level</strong>: Intermediate, Advanced &nbsp;</p><p>ScyllaDB, <a href="https://www.scylladb.com/2024/01/09/build-a-low-latency-video-streaming-app/">in their blog article</a>, discussed a video streaming app with minimal features such as listing videos on the UI that the user started watching, continue watching a video from where it was left off, display a progress bar under each video thumbnail, etc.</p><p>You&#8217;ll find the <a href="https://github.com/scylladb/video-streaming">GitHub repo for the project here</a>.</p><p><a href="https://www.scylladb.com/">ScyllaDB</a> is an open-source NoSQL wide-column database similar to Apache Cassandra. I&#8217;ve discussed NoSQL DB architecture with <a href="https://scaleyourapp.com/database-architecture-part-two/">ScyllaDB shard per core design here</a>. Do give it a read if you wish to delve deeper into it.</p><p>Earlier, <a href="https://scaleyourapp.com/state-of-backend-newsletter-issue-2/">Disney+ Hotstar replaced Redis and Elasticsearch with ScyllaDB</a> to implement the &#8216;Continue Watching&#8217; feature in their service, I&#8217;ve briefly written about it, do give it a read as well.</p><p>The above-linked resources will provide you with a background on ScyllaDB.</p><h3><strong>Key learnings from this project</strong></h3><p>Real-world video streaming services are complex with a lot many features, in addition to storing videos in cloud object stores. However, the above-stated project is a minimal project, giving us insights into how a low-latency NoSQL store can be leveraged to handle the large-scale data storage and retrieval requirements of a video streaming application.</p><blockquote><p>You&#8217;ll get insights into the data modeling for a video streaming service, including how, ideally, the data modeling of an application or for a certain feature should start with having an understanding of the system queries and data retrieval patterns as opposed to first creating the schema and then understanding the query patterns.</p></blockquote><h2><strong>Distributed aggregates &amp; user-defined functions</strong></h2><p>The project also entails the use of distributed aggregates and user-defined functions. Distributed aggregates help us run aggregate operations (like SUM, COUNT, AVG, etc.) across a distributed system.</p><p>In distributed databases, when the data is spread across multiple nodes, with distributed aggregates, we can perform aggregation operations, by leveraging the parallel processing capabilities of the database, without having to move the required data to a single location/node.</p><blockquote><p><strong>Related read</strong>: How <a href="https://www.scylladb.com/2023/06/20/how-scylladb-distributed-aggregates-reduce-query-execution-time-up-to-20x/">ScyllaDB distributed aggregates</a> reduce the query execution time up to 20x</p></blockquote><p>User-defined functions, on the other hand, are custom functions that extend the functionality of the database. These can be implemented in multiple programming languages as supported by the DB.</p><p>Furthermore, the project leverages <a href="https://nextjs.org/">NextJS</a> &amp; TypeScript, so you&#8217;ll get some insight into those as well.</p><blockquote><p>Speaking of video streaming services, below are a few recommended reads from my blog that I&#8217;ve written earlier. Do give them a read. </p><p><a href="https://scaleyourapp.com/youtube-database-how-does-it-store-so-many-videos-without-running-out-of-storage-space/">How does YouTube store so many videos</a> without running out of storage space?<br><br><a href="https://scaleyourapp.com/youtube-architecture-how-does-it-serve-high-quality-videos-with-low-latency/">How does YouTube serve videos</a> with high quality and low latency? An insight into its architecture.<br><br><a href="https://scaleyourapp.com/how-hotstar-scaled-with-10-3-million-concurrent-users-an-architectural-insight/">How Hotstar scaled with 10.3 million concurrent users</a> &#8211; an architectural insight</p></blockquote><div><hr></div><h2>Project 8: Code a TCP/IP server from scratch</h2><p><strong>Tags</strong>: Backend, Java, Networking, Web</p><p><strong>Level</strong>: Beginner, Intermediate</p><p>In two detailed posts, I've implemented a <a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">single-threaded</a> and a <a href="https://shivangsnewsletter.com/p/distributed-programming-part-3">multi-threaded TCP/IP server</a> from the bare bones in Java. You can follow the posts to implement your TCP/IP server either in Java or the programming language of your choice.</p><p>The single-threaded server handles client requests sequentially in a blocking fashion, one request at a time. In this scenario, all the subsequent or concurrent client requests are queued until the primary thread of execution is free to handle the subsequent client request.&nbsp;</p><p>In contrast, a multithreaded server improves our server's throughput by enabling it to concurrently handle a significantly higher number of client requests in a stipulated time in both blocking and non-blocking fashion.</p><h2>Key learnings from this project</h2><h2>The TCP/IP protocol</h2><p>The TCP/IP (Transmission Control Protocol/Internet Protocol) model is the core of communication that happens over the web and is a suite of data communication protocols.</p><p>It abstracts away most of the intricacies and complexities of network communication from our applications. This may include handling data congestion, ensuring data delivery with accuracy and integrity, averting the network from being overwhelmed with excessive data with the help of different network algorithms, etc.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6GHT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6GHT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp 424w, https://substackcdn.com/image/fetch/$s_!6GHT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp 848w, https://substackcdn.com/image/fetch/$s_!6GHT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp 1272w, https://substackcdn.com/image/fetch/$s_!6GHT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6GHT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp" width="1365" height="845" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:845,&quot;width&quot;:1365,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;TCP IP model&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TCP IP model" title="TCP IP model" srcset="https://substackcdn.com/image/fetch/$s_!6GHT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp 424w, https://substackcdn.com/image/fetch/$s_!6GHT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp 848w, https://substackcdn.com/image/fetch/$s_!6GHT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp 1272w, https://substackcdn.com/image/fetch/$s_!6GHT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9ed8252-2239-4582-a53f-f07f610b4c4d_1365x845.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As opposed to seven, which the OSI model contains, the TCP/IP architecture has five layers. They are the application layer, transport layer, internet layer, network interface layer and physical hardware layer.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HY8Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HY8Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp 424w, https://substackcdn.com/image/fetch/$s_!HY8Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp 848w, https://substackcdn.com/image/fetch/$s_!HY8Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp 1272w, https://substackcdn.com/image/fetch/$s_!HY8Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HY8Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp" width="1456" height="858" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:858,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OSI reference model&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OSI reference model" title="OSI reference model" srcset="https://substackcdn.com/image/fetch/$s_!HY8Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp 424w, https://substackcdn.com/image/fetch/$s_!HY8Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp 848w, https://substackcdn.com/image/fetch/$s_!HY8Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp 1272w, https://substackcdn.com/image/fetch/$s_!HY8Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b57f86-1827-4a8f-bd0b-1d6bf6a9f369_1488x877.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I've written a <a href="https://scaleyourapp.com/ip-layers-and-the-tcp-ip-model-a-deep-dive">detailed post on the IP layers and the TCP/IP model</a> on my blog. Do check it out. It will give you an overview of the fundamentals before you get down to writing code.</p><h2>TCP/IP &amp; distributed systems</h2><p>A TCP/IP server and its cluster forms the core of almost all distributed systems. Distributed systems like Kafka, Redis, etc., rely heavily on this protocol to achieve reliable communication, flow and congestion control, network partition handling and more.</p><h2>Redis</h2><p>If we take Redis, for instance, Redis clients connect to Redis servers using TCP connections. Each client connection is handled by a separate socket, ensuring commands and their responses are reliably exchanged. Redis typically uses long-lived TCP connections to handle multiple requests and responses, minimizing the overhead of establishing connections repeatedly.</p><p>Its RESP (REdis Serialization Protocol) is designed to work efficiently over TCP. Using TCP, RESP messages are sent from the client to the server and vice versa, ensuring reliable and ordered delivery.</p><p>In the cluster mode, multiple Redis nodes communicate with each other over TCP to manage data partitioning and replication. Nodes exchange information about the cluster state, data distribution, and more.</p><p>Redis further uses TCP connections to replicate data from master nodes to slave nodes. This ensures data redundancy and improves fault tolerance. Initial synchronization and ongoing replication traffic are continually transmitted over TCP, ensuring that the replicas stay consistent with the master.</p><h2>Kafka</h2><p>Kafka producers and consumers connect to Kafka brokers over TCP connections, ensuring the data is reliably produced and consumed. They maintain persistent TCP connections to brokers, reducing the overhead associated with repeatedly establishing connections, just like Redis.</p><p>Kafka uses a leader-follower replication model where the leader broker sends data to follower brokers over TCP. This ensures that data is consistently replicated across the cluster.</p><p>Moreover, its binary communication protocol operates over TCP as well and is optimized for high throughput and low latency.</p><p>This gives an idea of how critical TCP/IP is to implementing efficient, reliable and scalable distributed systems.</p><h2>Intricacies of client-server communication&nbsp;</h2><p>When implementing a TCP/IP server, you'll understand the intricacies of network communication along with gaining good insight into how servers communicate/exchange data over the web. In addition, you&#8217;ll have an understanding of essential concepts involved such as IP addresses, ports and sockets.</p><p>Furthermore, you'll understand multiple approaches we can leverage to make our server handle concurrent connections based on the use case and the pros and cons of each approach.</p><p><strong>There are primarily five ways to make our server handle concurrent connections:</strong></p><ol><li><p>Spawning a new thread for every client request</p></li><li><p>Thread pooling: Leveraging existing threads from the pool to serve subsequent requests</p></li><li><p>Hybrid approach: A mix of spawning new threads and pooling</p></li><li><p>Non-blocking asynchronous I/O: Enables our server to handle client requests in a non-blocking fashion</p></li><li><p>Event-driven approach: Closely related to the asynchronous, non-blocking approach; here, the server responds to events and processes the requests with event loops and callbacks.</p></li></ol><p>I've touched upon these approaches in implementing a <a href="https://shivangsnewsletter.com/p/distributed-programming-part-3">multithreaded TCP/IP server post</a>. Modern servers leverage a mix of multiple approaches to achieve the desired behavior. They strike a balance between resource utilization, responsiveness&nbsp;and scalability.</p><p>Developers monitor the memory usage, tune the code continually&nbsp;for optimum performance and adapt to changing workload conditions.&nbsp;This may include studying the average number of concurrent requests, processing time per request, incoming request patterns like frequency of traffic spikes, system resource consumption such as CPU, memory, I/O capacity, scalability requirements, and so on.&nbsp;</p><p>This project will be a stepping stone into the realm of systems programming. I've created a <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">roadmap for distributed system programming here</a>, in case you want to check it out.</p><blockquote><p>Though the code I've written is in Java, you can learn to code distributed systems in the backend programming language of your choice with CodeCrafters (Affiliate).</p><p><a href="https://bit.ly/3swSHHl">CodeCrafters</a> is a platform that helps us code distributed systems like Redis, Docker, Git, a DNS server, and more step-by-step from the bare bones in the programming language of our choice.</p><p>It's designed to help developers learn how to build complex systems from the ground up, focusing on teaching the internals of distributed systems and other related technologies through hands-on, project-based learning.</p><p>Each project is broken down into stages, with each stage focusing on implementing a specific feature or component of the system. Do check it out and kickstart your hands-on systems programming learning.</p><p>If you decide to make a purchase, you can use&nbsp;<a href="https://bit.ly/3QL4TN0">my unique link to get 40% off</a>. </p></blockquote><blockquote><p>More projects are coming soon to this list. My next article will be on how to wrap our heads around a GitHub repository. If you wish to be notified of my future posts, including the new project additions to this list, do <a href="https://shivangsnewsletter.com/">subscribe to this newsletter</a> (if you haven&#8217;t yet). Cheers!</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Enabling Our Server to Handle Concurrent Requests By Implementing a Multithreaded TCP/IP Server]]></title><description><![CDATA[In my previous post, I implemented a bare-bones single-threaded TCP/IP server, helping us understand the intricacies of client-server communication over TCP/IP, in addition to, what goes on inside the server when a client request arrives and the related concepts.]]></description><link>https://shivangsnewsletter.com/p/distributed-programming-part-3</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/distributed-programming-part-3</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Mon, 06 May 2024 08:07:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!K1Wg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In my previous post, I implemented a bare-bones <a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">single-threaded TCP/IP server</a>, helping us understand the intricacies of client-server communication over TCP/IP, in addition to, what goes on inside the server when a client request arrives&nbsp;and&nbsp;the related concepts. In case you haven't read it yet, it's a recommended read before you get on with this post.&nbsp;</p><p>This post entails a discussion on the implementation of a multithreaded TCP/IP server that will improve our server's throughput enabling it to concurrently handle a significantly higher number of client requests in a stipulated time.&nbsp;</p><p>With that being said, let's get started.&nbsp;</p><h2>Ways to make our server handle requests concurrently</h2><p>Our <a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">single-threaded server</a> could handle only one request at a time. All the subsequent or concurrent client requests were queued until the primary thread of execution was free to handle the subsequent client request.&nbsp;</p><p>In this scenario, when the single thread of execution is busy handling the current client request, the subsequent client requests will experience delays in receiving the response or, worse, connection timeouts.&nbsp;</p><p>There are multiple ways to make our server handle the client requests concurrently as opposed to sequentially, increasing its throughput.&nbsp;</p><h2>1. Multithreading: Spawning a new thread for every client request</h2><p>In this multithreaded approach, our server will spawn a new thread to handle every client request. Individual threads handling requests of different clients send the responses to respective clients and are terminated, completing their lifecycle.&nbsp;</p><p>The downside of this approach is the overhead of thread creation and destruction on every client request. Also, if the number of threads spawned is high, there is additional overhead of thread context switching.</p><p>The upside is immediate handling of client requests by a new thread without having to wait on a queue.&nbsp;This&nbsp;decreases the response latency, thus increasing our server's throughput.</p><h2>2. Thread Pooling: Leveraging existing threads from the pool to serve subsequent requests</h2><p>In this approach, our server has a pre-allocated pool of threads to handle the client requests. When a request arrives, the server checks the pool for available threads and assigns a thread to the request.&nbsp;</p><p>This approach averts the overhead of creating and destroying a thread every time a client request arrives. However, in this approach, if all the threads in the pool are occupied, the client requests are queued and they may experience delays or timeouts based on the server capacity.&nbsp;</p><h2>3. Hybrid approach: A mix of spawning new threads and pooling</h2><p>In the hybrid approach, we leverage a mix of both approaches to optimize our servers' performance. Client requests are handled by a thread pool&nbsp;and&nbsp;if the threads are busy, as opposed to queuing the requests, the server spawns a new thread to handle them.&nbsp;</p><p>Server design involves tradeoffs and largely depends on the requirements. If we are focused on saving resources, we could queue the requests for the busy threads in the pool to process them later and not spawn new threads. And if we are concerned about the response latency and throughput, we should spawn new threads if and when the pool is occupied.&nbsp;</p><p>Also, thread pools are best fit where client requests are relatively short-lived and the threads are free sooner to handle the subsequent client requests. In contrast, if the client requests are long-lived, they may keep all the threads from the pool occupied, requiring us to spawn new threads to handle future requests. In this scenario, a multithreaded approach spawning new threads for every request would make more sense.&nbsp;</p><p>We need to keep the right balance between resource utilization and responsiveness. Understanding the server memory usage with the help of monitoring tools helps us make these decisions easier.&nbsp;</p><p>Besides these, there are two other approaches that enable us to make our server handle concurrent requests efficiently: Non-blocking asynchronous I/O and the event-driven approach.&nbsp;</p><h2>4. Non-blocking asynchronous I/O</h2><p>The above discussed multithreaded approaches are blocking in nature, where the main thread of execution is blocked to handle client requests. The non-blocking asynchronous I/O approach enables our server to handle the client requests in a non-blocking fashion.&nbsp;</p><p>Both blocking and non-blocking approaches have their use cases. The multithreaded blocking approach fits best for use cases where the request is CPU-intensive&nbsp;and&nbsp;threads can run in parallel, making the best use of multi-core processors. Also, we have more fine-grained control over thread creation, resource allocation&nbsp;and&nbsp;management.&nbsp;</p><p>In the non-blocking asynchronous approach, no thread of execution is blocked. Instead, the client requests are handled asynchronously&nbsp;and&nbsp;the server can continue processing other tasks while the client request is being processed.&nbsp;This&nbsp;allows the server to maximize resource utilization and responsiveness.</p><p>The non-blocking approach is fit for I/O-intensive requests/use cases, where a significant portion of the server's time is spent waiting for external resources, such as disk I/O or network communication. Requests that need to read from or write to the DB is one example of this.&nbsp;</p><h2>5. Event-driven approach&nbsp;</h2><p>The event-driven approach is closely related to the asynchronous, non-blocking approach, where the server responds to events and processes the requests with event loops and callbacks. The events are dispatched to appropriate handlers or callbacks once they are received&nbsp;and&nbsp;the server keeps running the event loop to handle the client requests in a non-blocking fashion.&nbsp;</p><p>The asynchronous non-blocking and the event-driven approach may look similar, but they differ in the implementation, programming models&nbsp;and&nbsp;how they handle concurrency and I/O operations.&nbsp;</p><p>In this post, I'll delve into the implementation of the first two approaches (spawning new threads and thread pooling)&nbsp;and&nbsp;the remaining approaches will be discussed in future posts.&nbsp;</p><h2>Implementing a multithreaded server which spawns a new thread for handling every client request&nbsp;</h2><p>Below is the code for a multithreaded TCP server that spawns a new thread to handle concurrent client requests. If you've gone through my previous post, you'll understand the code better.</p><pre><code>public class TCPMultithreadedServer {

   private static final Logger logger = LoggerFactory.getLogger(TCPMultithreadedServer.class);

  public static void main(String[] args) {
    int port = 6545;
    ServerSocket serverSocket = null;

  try {
    serverSocket = new ServerSocket(port);
    logger.info("Multithreaded TCP server listening on port " + port);

  while (true) {
    Socket clientSocket = serverSocket.accept();
    logger.info("Client connected: " + clientSocket.getInetAddress());

  Thread clientThread = new Thread(new ClientHandler(clientSocket));
   clientThread.start();
   logger.info("New thread spawned to handle client request with ID: " +    clientThread.getId());
  }
} catch (IOException e) {
   logger.error("Error: " + e.getMessage(), e);
} finally {
   try {
     if (serverSocket != null) {
       serverSocket.close();
     }
   } catch (IOException e) {
       logger.error("Error closing server socket: " + e.getMessage(), e);
  }
 }
}

private static class ClientHandler implements Runnable {
  private final Socket clientSocket;

  public ClientHandler(Socket clientSocket) {
     this.clientSocket = clientSocket;
  }

  @Override
  public void run() {
     try (BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
     PrintWriter out = new PrintWriter(clientSocket.getOutputStream(), true)) {
       String message;
       while ((message = in.readLine()) != null) {
         logger.info("Message received from the client: " + message);
         out.println("Server received: " + message); // Echo back to client
         logger.info("Message sent back to the client: " + message);
        }
       } catch (IOException e) {
           logger.error("Error handling client: " + e.getMessage(), e);
        } finally {
            try {
              clientSocket.close();
              logger.info("Client disconnected.");
         } catch (IOException e) {
              logger.error("Error closing client socket: " + e.getMessage(), e);
                }
            }
        }
    }
}</code></pre><p>The program creates a server socket object and binds it to port 6545. The socket listens for incoming client connections via the accept() method. I've discussed sockets, ports and client-server connections over TCP in my former post in detail. <a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">Please refer to it for more information</a>.&nbsp;</p><p>I'll directly get to the part where our server spawns a new thread to process a client request.&nbsp;</p><pre><code>Thread clientThread = new Thread(new ClientHandler(clientSocket));
clientThread.start();</code></pre><p>The above code creates a new thread to handle a client connection every time it receives a request. The ClientHandler object goes as an argument in the Thread class object as it holds the logic for processing the client request. It further takes in clientSocket as an argument.&nbsp;</p><p>The ClientHandler class implements the Runnable interface and implements the run() method.&nbsp;</p><p>In Java, the&nbsp;Runnable&nbsp;interface is used to define a task that can be executed concurrently by a thread. When a class implements the&nbsp;Runnable&nbsp;interface, it indicates that instances of that class can be executed as separate threads by passing them to a&nbsp;Thread&nbsp;object.</p><p>Every time a client request arrives, a new thread is spawned with the ClientHandler argument and the thread processes the run() method of the ClientHandler class. Each client request is processed in its own thread. This way, the clients don't have to wait while the server processes the current&nbsp;requests,&nbsp;all the requests are processed parallely&nbsp;and&nbsp;the throughput of our server is increased.&nbsp;</p><p>The ClientHandler class is implemented as a static nested class to encapsulate the logic for processing a client request. The isolation of the request processing logic makes our code more modular, organized&nbsp;and&nbsp;readable. We can further reuse it in different contexts when required.&nbsp;</p><p>In addition, having the thread processing logic in a separate class further enables us to manage the state of thread execution with class variables in addition to the behavior.&nbsp;This&nbsp;would not be possible if the code were enclosed in a function/method.&nbsp;</p><p>Adding static to the ClientHandler class makes it independent of the state of the outer class. This helps with thread safety.&nbsp;</p><p>I tested the server with a client program&nbsp;and&nbsp;here are the logs. You'll find the client program in my previous newsletter post as well.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K1Wg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K1Wg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png 424w, https://substackcdn.com/image/fetch/$s_!K1Wg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png 848w, https://substackcdn.com/image/fetch/$s_!K1Wg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png 1272w, https://substackcdn.com/image/fetch/$s_!K1Wg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K1Wg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png" width="1362" height="506" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b132075c-3eec-460b-8696-6bb308838d44_1362x506.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:506,&quot;width&quot;:1362,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79837,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K1Wg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png 424w, https://substackcdn.com/image/fetch/$s_!K1Wg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png 848w, https://substackcdn.com/image/fetch/$s_!K1Wg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png 1272w, https://substackcdn.com/image/fetch/$s_!K1Wg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PIA8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PIA8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png 424w, https://substackcdn.com/image/fetch/$s_!PIA8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png 848w, https://substackcdn.com/image/fetch/$s_!PIA8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png 1272w, https://substackcdn.com/image/fetch/$s_!PIA8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PIA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png" width="1352" height="297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:297,&quot;width&quot;:1352,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42275,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PIA8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png 424w, https://substackcdn.com/image/fetch/$s_!PIA8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png 848w, https://substackcdn.com/image/fetch/$s_!PIA8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png 1272w, https://substackcdn.com/image/fetch/$s_!PIA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634d031b-3dc6-4ea3-9446-260e16848ec2_1352x297.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The server is hit by five concurrent requests that it handles by spawning a new thread for every request. </p><p>In Java, thread IDs are represented as long integers that we see in the logs. They are assigned&nbsp;sequentially&nbsp;as threads are created by the JVM.</p><h2>Request response flow&nbsp;</h2><p>When the main thread of our server receives a client request, it delegates the work to a newly spawned thread and returns to listen for other connections immediately.&nbsp;This&nbsp;allows our server to handle multiple client connections concurrently, which wasn't possible in our single-threaded TCP server.&nbsp;</p><p>The ability to handle multiple connections concurrently increases our server's responsiveness and throughput. This concurrent processing model is a common approach in servers that&nbsp;handle&nbsp;a large volume of&nbsp;concurrent&nbsp;client requests.&nbsp;</p><h2>The accept() method is still blocking in nature. What if a large number of client requests arrive concurrently?</h2><p>Even though our server handles client requests parallely in separate threads, the main thread that runs the accept() method is still blocking in nature.&nbsp;&nbsp;</p><p>The accept() method processes the client requests sequentially and then delegates them to individual threads. If a large number of requests arrive while the main thread is delegating work to a&nbsp;certain&nbsp;thread and is occupied, the requests are queued by the operating system.&nbsp;</p><p>The length of this queue, often referred to as the "backlog queue," determines the maximum number of pending connections that can be put into a waiting state for the server to accept them. The main thread handles them in the order they were received once it is free.&nbsp;</p><p>Moreover, it's essential to understand that the capacity of the server to handle concurrent connections is not solely determined by the operating system's queue.&nbsp;Factors such as system resources (CPU, memory), network bandwidth and the server's implementation (e.g., thread management, I/O handling, etc.) also play crucial roles in determining the server's scalability and performance.</p><p>Now, let's move on to the&nbsp;next&nbsp;implementation, where our server leverages a thread pool to handle concurrent client requests.&nbsp;</p><h2>Implementing a multithreaded server that handles concurrent client requests with thread pooling</h2><p>Below is the code for a multithreaded TCP server that maintains a thread pool to handle client requests as opposed to spawning a new thread every time.&nbsp;</p><pre><code>public class TCPThreadPoolServer {

    private static final int PORT = 6555;
    private static final int THREAD_POOL_SIZE = 8;

    private static final Logger logger = LoggerFactory.getLogger(TCPThreadPoolServer.class);

    public static void main(String[] args) {
        ExecutorService executor = Executors.newFixedThreadPool(THREAD_POOL_SIZE);

        try (ServerSocket serverSocket = new ServerSocket(PORT)) {
            logger.info("Thread pool TCP server listening on port " + PORT);

            while (true) {
                Socket clientSocket = serverSocket.accept();
                logger.info("Client connected: " + clientSocket.getInetAddress());

                executor.execute(new ClientHandler(clientSocket));
                logger.info("Client request submitted to thread pool for processing.");
            }
        } catch (IOException e) {
            logger.error("Error: " + e.getMessage(), e);
        } finally {
            executor.shutdown();
        }
    }

    private static class ClientHandler implements Runnable {
        private Socket clientSocket;

        public ClientHandler(Socket clientSocket) {
            this.clientSocket = clientSocket;
        }

        @Override
        public void run() {
            try (BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
                    PrintWriter out = new PrintWriter(clientSocket.getOutputStream(), true)) {
             String message;
             while ((message = in.readLine()) != null) {
                logger.info("Message received from the client: " + message);
                out.println("Server received: " + message);
                logger.info("Message sent back to the client: " + message);
                }
            } catch (IOException e) {
                logger.error("Error handling client: " + e.getMessage(), e);
            } finally {
                try {
                    clientSocket.close();
                    logger.info("Client disconnected.");
                } catch (IOException e) {
                    logger.error("Error closing client socket: " + e.getMessage(), e);
                }
            }
        }
    }
}</code></pre><p>This server program is&nbsp;pretty&nbsp;similar to the initial server program&nbsp;where&nbsp;we spawned a new thread for every client request.&nbsp;The primary difference is that here&nbsp;we are leveraging the Java Executor framework to create and manage a thread pool to handle client requests.&nbsp;</p><p>As opposed to explicitly spawning new threads on every client request, the Executor framework manages a pool of pre-allocated threads, cutting down on the thread creation overhead with every client request. The threads in the pool remain alive throughout the application's lifecycle.&nbsp;</p><pre><code>ExecutorService executor = Executors.newFixedThreadPool(THREAD_POOL_SIZE);</code></pre><p>The above line of code creates a fixed-size thread pool where&nbsp;THREAD_POOL_SIZE&nbsp;is the number of threads in the pool.</p><pre><code>executor.execute(new ClientHandler(clientSocket));</code></pre><p>The ClientHandler task is submitted to the thread pool for execution.&nbsp;This&nbsp;allows the thread pool to handle multiple client connections&nbsp;concurrently.</p><pre><code>finally {
&nbsp; executor.shutdown();
}</code></pre><p>In the&nbsp;finally&nbsp;block, the executor framework shuts down the thread pool when the server is stopped for any resources to be released.&nbsp;</p><p>In the Java ecosystem, the Executor framework is&nbsp;largely&nbsp;used when dealing with threads as it provides better thread management in contrast to when explicitly handling the threads, plus other features as well. It comes in handy when working with threads.</p><p>With the Executor framework, we can create different types of thread pools, such as:</p><ul><li><p><strong>Fixed thread pool</strong>: Has a fixed number of threads</p></li><li><p><strong>Cached thread pool</strong>: Can dynamically adjust the number of threads based on the workload. It spawns new threads when required.</p></li><li><p><strong>Single thread pool</strong>: Maintains a single thread in the pool</p></li><li><p><strong>Scheduled thread pool</strong>: Allows tasks to be scheduled for execution at a specified time or with a fixed delay.</p></li><li><p><strong>Work stealing pool</strong>:&nbsp;This pool leverages the work-stealing algorithm to achieve high throughput and load balancing.&nbsp;Each thread in the pool has its&nbsp;own&nbsp;task queue and&nbsp;idle threads steal tasks from the queues of other threads to keep&nbsp;themselves&nbsp;busy.&nbsp;This pool is fit for parallel task processing use cases.</p></li></ul><h2>Queuing requests in an internal task queue with thread pooling</h2><p>When all the threads in the pool are occupied&nbsp;and&nbsp;a client request arrives, the Executor framework queues that request to an internal task queue. When a thread becomes available, it picks up the request from the task queue. In addition, we can also set a task rejection policy based on our requirements.&nbsp;</p><p>Here are the server logs on running our thread pool program and sending concurrent requests to it.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VkeM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VkeM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png 424w, https://substackcdn.com/image/fetch/$s_!VkeM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png 848w, https://substackcdn.com/image/fetch/$s_!VkeM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png 1272w, https://substackcdn.com/image/fetch/$s_!VkeM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VkeM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png" width="1375" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:1375,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:97038,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VkeM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png 424w, https://substackcdn.com/image/fetch/$s_!VkeM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png 848w, https://substackcdn.com/image/fetch/$s_!VkeM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png 1272w, https://substackcdn.com/image/fetch/$s_!VkeM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc74b4698-357b-46b5-8faf-ca2af7ff68fc_1375x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R_XN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R_XN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png 424w, https://substackcdn.com/image/fetch/$s_!R_XN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png 848w, https://substackcdn.com/image/fetch/$s_!R_XN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png 1272w, https://substackcdn.com/image/fetch/$s_!R_XN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R_XN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png" width="1363" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:1363,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89730,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R_XN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png 424w, https://substackcdn.com/image/fetch/$s_!R_XN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png 848w, https://substackcdn.com/image/fetch/$s_!R_XN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png 1272w, https://substackcdn.com/image/fetch/$s_!R_XN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba73235-569d-4ad1-a9fc-6eb63734926c_1363x570.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d2iG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d2iG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png 424w, https://substackcdn.com/image/fetch/$s_!d2iG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png 848w, https://substackcdn.com/image/fetch/$s_!d2iG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png 1272w, https://substackcdn.com/image/fetch/$s_!d2iG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d2iG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png" width="1362" height="193" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:193,&quot;width&quot;:1362,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26072,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d2iG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png 424w, https://substackcdn.com/image/fetch/$s_!d2iG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png 848w, https://substackcdn.com/image/fetch/$s_!d2iG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png 1272w, https://substackcdn.com/image/fetch/$s_!d2iG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9529296b-ed8d-4678-8fb0-df91c11c0ab8_1362x193.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The server was hit by nine concurrent requests and each is handled by a thread from the pool. </p><h2>A hybrid approach employing both thread pooling and spawning a new thread&nbsp;</h2><p>Spawning a new thread to handle every client request and leveraging a thread pool to use existing threads to handle client requests are two specific implementations or approaches.&nbsp;</p><p>Modern servers leverage a mix of multiple approaches to achieve the desired behavior. Both thread pooling and thread spawning approaches have their pros and cons. I've discussed this a bit at the beginning of this post.&nbsp;</p><p>Modern servers strike a balance between resource utilization, responsiveness&nbsp;and&nbsp;scalability. Devs monitor the memory usage&nbsp;and tune&nbsp;the code continually&nbsp;for optimum performance&nbsp;and&nbsp;adapt to changing workload conditions.&nbsp;This&nbsp;may include studying the average number of concurrent requests, processing time per request, incoming request patterns like frequency of traffic spikes, system resource consumption such as CPU, memory, I/O capacity, scalability requirements, and so on.&nbsp;</p><blockquote><p>Though the code in this post is in Java, you can learn to code distributed systems in the backend programming language of your choice with CodeCrafters (Affiliate).&nbsp;</p><p><a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>&nbsp;is a platform that helps us code distributed systems like Redis, Docker, Git, a DNS server, and more step-by-step from the bare bones in the programming language of our choice. With their hands-on courses, we not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare our project with the community and then finally navigate the official source code to see how it&#8217;s done. It&#8217;s a headstart to becoming an OSS contributor.</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase.</p></blockquote><p>Both the system design and software architecture concepts and the distributed system fundamentals are a vital part of the system design interviews and coding distributed systems.&nbsp;</p><blockquote><p>If you wish to master the fundamentals, check out my&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency learning path</a>, comprising three courses that go through all the concepts starting from zero in an easy-to-understand language. The courses educate you, step by step, on the domain of software architecture, cloud infrastructure and distributed services design.</p></blockquote><p>You can also check out several system design case studies and blog articles that I have written in this&nbsp;<a href="https://shivangsnewsletter.com/">newsletter</a>&nbsp;and&nbsp;<a href="https://scaleyourapp.com/">my blog</a>.&nbsp;</p><p>Well, this pretty much&nbsp;sums up the implementation of our TCP server,&nbsp;which&nbsp;can handle multiple clients concurrently.&nbsp;In my next post,&nbsp;I'll&nbsp;possibly&nbsp;delve into creating a node cluster and implementing a replicated state machine.&nbsp;</p><p>If you found this newsletter post helpful, consider sharing it with your friends for more reach.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/distributed-programming-part-3?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/distributed-programming-part-3?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>If you are reading the&nbsp;<a href="https://shivangsnewsletter.com/">web version of this post</a>, consider subscribing to get my posts delivered to your inbox as soon as they are published.</p><p>You can get a 50% discount on my courses by sharing my posts with your network. Based on referrals, you can unlock course discounts. Check out the&nbsp;<a href="https://shivangsnewsletter.com/leaderboard">leaderboard page</a>&nbsp;for details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pQGu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pQGu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!pQGu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!pQGu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!pQGu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pQGu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png" width="982" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:982,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pQGu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!pQGu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!pQGu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!pQGu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfbdbdcc-e56d-4634-af10-7787add3ab02_982x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a>&nbsp;as well. I'll see you in the next post. Until then, Chao!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Implementing a Single-threaded Blocking Bare Bones TCP/IP Server]]></title><description><![CDATA[This newsletter post is a continuation of my previous introductory post on coding distributed systems from the bare bones. If you haven't read it yet, I recommend giving it a read.]]></description><link>https://shivangsnewsletter.com/p/distributed-programming-part-2</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/distributed-programming-part-2</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Wed, 10 Apr 2024 05:50:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G1E3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This newsletter post is a continuation of my previous introductory post on <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">coding distributed systems from the bare bones</a>. If you haven't read it yet, I recommend giving it a read.&nbsp;</p><p>In this post, I'll be discussing the implementation of a basic TCP/IP server that will, down the road, enable us to implement a node cluster for implementing replicated state machines (You'll find the background info on this in the previous post). With that being said, let's get started.&nbsp;</p><h2>Implementing a TCP/IP Server&nbsp;</h2><p>Here is the code for a bare-bones single-threaded blocking TCP/IP server:</p><pre><code>public class TCPServer {

private static final Logger logger = LoggerFactory.getLogger(TCPServer.class);

    public static void main(String[] args) {

        int port = 6523;
        ServerSocket serverSocket = null;

        try {
            serverSocket = new ServerSocket(port);
            logger.info("TCP server listening on port " + port);

            while (true) {
                Socket clientSocket = serverSocket.accept();
                logger.info("Client connected: " + clientSocket.getInetAddress());

                BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
                PrintWriter out = new PrintWriter(clientSocket.getOutputStream(), true);

                String message;
                while ((message = in.readLine()) != null) {
                    logger.info("Message received from the client: " + message);
                    out.println("Server received: " + message);
                    logger.info("Message sent back to the client: " + message);
                }

                clientSocket.close();
                logger.info("Client disconnected.");
            }
        } catch (IOException e) {
            logger.error("Error: " + e.getMessage(), e);
        } finally {
            try {
                if (serverSocket != null) {
                    serverSocket.close();
                }
            } catch (IOException e) {
                logger.error("Error closing server socket: " + e.getMessage(), e);
            }
        }
    }
}</code></pre><p>Let's break it down step by step, in addition to understanding the concepts involved.</p><pre><code>public static void main(String[] args) {</code></pre><p>Java's main method does not need an explanation; we are all aware that it is the entry point for running a Java program (that's why it's called main). The JVM calls it when it loads the class into the memory without the need to create an instance of the class first.&nbsp;</p><pre><code>int port = 6523;
ServerSocket serverSocket = null;

try {
  serverSocket = new ServerSocket(port);
  logger.info("TCP server listening on port " + port);</code></pre><p>In the above lines of code, we have a port and a socket. What are these and how do they work together? Let's quickly understand them.&nbsp;</p><h2>IP Address, Ports, and Sockets&nbsp;</h2><h2>IP Address</h2><p>For the client to connect to the server, it needs to know its IP Address, and when the client sends a request to the server over TCP/IP, the request is encapsulated in a series of network packets. These packets contain the source IP and the destination IP.&nbsp;</p><p>Upon receiving the request, the server determines the client's IP address from the incoming packets, enabling the server to respond to the correct IP address. This makes IP an essential component of web communication.&nbsp;</p><p>Every machine online has an IP (Internet Protocol) address via which they communicate with each other over the internet. The IP address is a unique identifier assigned to devices and DNS resolvers translate the human-readable domain names into IP addresses to enable end users to connect to an IP without the need to memorize the hard-to-remember IP addresses.</p><p>TCP (Transmission Control Protocol) enables reliable data exchange between machines over the internet. Most web communication today runs over the TCP/IP model, whether it&#8217;s the communication between the client and the server, sending emails, files, and so on. You can read more about the underlying <a href="https://scaleyourapp.com/ip-layers-and-the-tcp-ip-model-a-deep-dive/">IP layers in network communication, including the TCP/IP and the OSI model</a>, on my blog post.</p><h2>Ports&nbsp;</h2><p>A port is a virtual logical connection managed by the machine's OS. A port number is always associated with an IP address.&nbsp;</p><p>The IP address helps us determine the machine we intend to connect to and the port number helps us determine the service or the program running on that machine we intend to interact with. The service can be an email service, a web page rendering service, an FTP service, etc.&nbsp;</p><p>Port numbers are reserved for specific services. For instance, all the HTTP connections hit port 80 on a server, HTTPS port 443. All the FTP connections hit port 21, SSH port 22, email port 25, DNS port 53, NTP port 123 and so on. Ports help servers understand the function they have to perform with the data they receive over different ports. A client can hit different ports of a server to execute different processes.&nbsp;</p><p>As an example, the TCP server runs on my local machine on port 6523. When the client sends a request to localhost:6523, it can connect with to the TCP server.&nbsp;</p><blockquote><p>Now, since the server is running on my local machine and I am running the client program on the same machine, it can send a request to localhost:6523. If it were running on a separate machine, I would have to replace the localhost with my system's IP.</p></blockquote><h2>Sockets&nbsp;</h2><p>Sockets are linked to the ports and can be seen as endpoints of process communication. The TCP connection between the client and the server is facilitated by two endpoints, aka sockets (the client socket and the server socket).&nbsp;</p><h2>Hopping Back to Our Code</h2><pre><code>int port = 6523;
ServerSocket serverSocket = null;

try {
   serverSocket = new ServerSocket(port);
   logger.info("TCP server listening on port " + port);

while (true) {
   Socket clientSocket = serverSocket.accept();
   logger.info("Client connected: " + clientSocket.getInetAddress());</code></pre><p>6523 is the port at which our program runs. serverSocket = new ServerSocket(port) creates a server socket object and binds it to port 6523. This socket listens for incoming client connections via the accept() method: Socket clientSocket = serverSocket.accept();&nbsp;</p><p>The accept() method of the ServerSocket object accepts the incoming client connections and returns an instance of the Socket object, which represents the client socket object.&nbsp;</p><p>When our server starts, the accept() method runs and blocks the server thread of execution. In other words, it suspends the execution of the program until a client connection request arrives, i.e., it waits for the incoming client connection requests.</p><p>In network programming, server sockets typically use blocking I/O operations to wait for incoming client connections or to send and receive data. Blocking operations can cause the thread executing the operation to enter a waiting state, during which it consumes minimal CPU resources.</p><p>When the client connection request arrives, the accept() method returns a socket object representing an established connection between the client and the server, enabling bidirectional data exchange.&nbsp;</p><p>The clientSocket object encapsulates the communication channel between the server and the specific client that initiated the connection.</p><p>After the connection is established, the server proceeds to handle the client request (whatever that is).&nbsp;</p><h2>Running While Loop Indefinitely</h2><pre><code>while (true) {</code></pre><p>We use the while loop to enable the server to continuously listen for incoming client connections via the accept() method. Once the current client request is handled, the server loops back to the accept() method, that waits for the subsequent incoming client connection request.&nbsp;</p><p>This loop continues indefinitely, enabling the server handle multiple client connections sequentially.&nbsp;</p><pre><code>logger.info("Client connected: " + clientSocket.getInetAddress());</code></pre><p>clientSocket.getInetAddress() method is used to retrieve the IP address of the client.&nbsp;</p><h2>Communicating with the Client via Streams</h2><p>The communication between the client and the server involves exchanging bytes between each other, which in Java is based on streams. Streams are a way to handle input-output operations representing a sequence of data that can be read from and written to.</p><pre><code>BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
PrintWriter out = new PrintWriter(clientSocket.getOutputStream(), true);</code></pre><p>After the connection with the client is established, the server handles the request by reading data from the clientSocket input stream and sends the response back to the output stream.&nbsp;</p><p>The InputStreamReader class takes the byte stream from the underlying input stream and converts it to the character stream. The BufferedReader class reads text from the character-input stream, obtaining information sent from the client.</p><blockquote><p>There are two types of streams in Java: byte streams and character streams. Byte streams are used to handle raw binary data, such as images, audio files, and other non-text data. They operate at the byte level and provide a way to read and write raw bytes of data. </p><p>Character streams are used to handle text data. These streams are built on top of byte streams and are capable of converting bytes to characters and vice versa. The InputStreamReader is a byte stream class and the BufferedReader is the character stream class.&nbsp;</p><p>Depending on the type of data being processed, we can choose the fitting stream type to perform input/output operations efficiently.</p></blockquote><p>The data through the network is typically transmitted as a sequence of bytes. However, when dealing with text-based communication over the network, such as sending and receiving strings or textual data, it's often more convenient and efficient to work with characters rather than raw bytes. This is why we converted the byte stream to the character stream.</p><p>Converting the byte stream to a character stream will allow us to perform text processing tasks (if required), such as parsing, tokenization, and manipulation directly on the text data, without having to deal with low-level byte manipulation. When dealing with textual data, it's more natural to work with characters as opposed to raw bytes.</p><p>The PrintWriter class creates an output stream to write data to the client socket. This class does the reverse operation and converts the character data to the byte data before writing it to the output stream.&nbsp;</p><h2>Reading Data &amp; Echoing It Back to the Client</h2><pre><code>String message;
while ((message = in.readLine()) != null) {
   logger.info("Message received from the client: " + message);
   out.println("Server received: " + message);
   logger.info("Message sent back to the client: " + message);
 }</code></pre><p>The while loop runs until the received message is not null or the end of the stream is reached. out is the PrintWriter object that writes the message to the output stream.&nbsp;</p><pre><code>clientSocket.close();
logger.info("Client disconnected.");</code></pre><p>After the communication is complete, the server closes the connection for the resources to be released.&nbsp;</p><pre><code>} catch (IOException e) {
     logger.error("Error: " + e.getMessage(), e);
        } finally {
            try {
                if (serverSocket != null) {
                    serverSocket.close();
                }
            } catch (IOException e) {
                logger.error("Error closing server socket: " + e.getMessage(), e);
            }
        }
    }
}</code></pre><p>This part of the code handles any exceptions that might arise, logging them appropriately, in addition to final resource management and cleanup tasks if required.</p><p>So, folks, we've covered the bare-bones implementation of a TCP/IP server along with the associated concepts. We've understood the underpinnings of the communication between the client and the server over the network, <a href="https://scaleyourapp.com/ip-layers-and-the-tcp-ip-model-a-deep-dive/">the different layers of the OSI network involved</a> and so on.</p><p>Our TCP/IP server listens to incoming client requests and echoes back any messages received from them. After the response is sent, the connection is closed.&nbsp;</p><p>Time to test our server.&nbsp;</p><h2>Testing Our TCP/IP Server&nbsp;</h2><p>I've used Telnet to test the remote connection with the server. Telnet stands for Teletype Network. It's a protocol that enables a machine to connect to the other machine over the network operating on the client-server principle.&nbsp;</p><p>Since the server program is running on my local machine, I connected to it via the cmd command 'telnet localhost 6523'. 6523 is the port number the program is running on.&nbsp;</p><p>I then sent a message to the server and received it back. Our TCP/IP server is confirmed up and running. Woohoo!</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S6K6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S6K6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png 424w, https://substackcdn.com/image/fetch/$s_!S6K6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png 848w, https://substackcdn.com/image/fetch/$s_!S6K6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png 1272w, https://substackcdn.com/image/fetch/$s_!S6K6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S6K6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png" width="1456" height="272" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:272,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14928,&quot;alt&quot;:&quot;Testing TCP server via Telnet client&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Testing TCP server via Telnet client" title="Testing TCP server via Telnet client" srcset="https://substackcdn.com/image/fetch/$s_!S6K6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png 424w, https://substackcdn.com/image/fetch/$s_!S6K6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png 848w, https://substackcdn.com/image/fetch/$s_!S6K6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png 1272w, https://substackcdn.com/image/fetch/$s_!S6K6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d49d56c-141d-43ee-ae8a-9671fa8911f5_1507x282.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Here are our server logs:&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Za7Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Za7Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png 424w, https://substackcdn.com/image/fetch/$s_!Za7Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png 848w, https://substackcdn.com/image/fetch/$s_!Za7Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png 1272w, https://substackcdn.com/image/fetch/$s_!Za7Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Za7Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png" width="1428" height="142" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:142,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21140,&quot;alt&quot;:&quot;TCP server logs&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TCP server logs" title="TCP server logs" srcset="https://substackcdn.com/image/fetch/$s_!Za7Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png 424w, https://substackcdn.com/image/fetch/$s_!Za7Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png 848w, https://substackcdn.com/image/fetch/$s_!Za7Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png 1272w, https://substackcdn.com/image/fetch/$s_!Za7Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01fe760a-04e8-4c8a-b950-b41601701b7d_1428x142.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>Single-threaded Blocking Behavior</h2><p>As I mentioned before, our TCP/IP server is a single-threaded blocking server that handles client requests sequentially, one request at a time. To test this, I wrote a client program that spawns five threads to send concurrent connection requests to the server.&nbsp;</p><pre><code>public class TCPConcurrentClients {

    private static final Logger logger = LoggerFactory.getLogger(TCPConcurrentClients.class);

    public static void main(String[] args) {
        int concurrentClients = 5;
        String serverAddress = "localhost";
        int serverPort = 6523;

        for (int i = 0; i &lt; concurrentClients; i++) {
            Thread clientThread = new Thread(() -&gt; {
                try {
                    logger.info("Client " + Thread.currentThread().getName() + " connecting to server...");

                    Socket clientSocket = new Socket(serverAddress, serverPort);

                    logger.info("Client " + Thread.currentThread().getName() + " connected to server.");

                    BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
                    PrintWriter out = new PrintWriter(clientSocket.getOutputStream(), true);

                    logger.info("Client " + Thread.currentThread().getName() + " sending message to server...");
                    out.println("Hello from client " + Thread.currentThread().getName());

                    String response = in.readLine();
                    logger.info(
                            "Response from server to client " + Thread.currentThread().getName() + ": " + response);

                    clientSocket.close();
                    logger.info("Client " + Thread.currentThread().getName() + " disconnected from server.");
                } catch (IOException e) {
                    e.printStackTrace();
                }
            });
            clientThread.start();
        }
    }
}</code></pre><p>The above program fires concurrent requests to our server and our server handles them sequentially. Though the server handles the request sequentially, these connection requests may be queued by the operating system or if we have an explicit request backlog queue implemented (More on this in the upcoming posts). These requests may also experience delays or timeouts depending on the server implementation.&nbsp;</p><p>Here are the server logs that show the sequential handling of the clients' requests:&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G1E3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G1E3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png 424w, https://substackcdn.com/image/fetch/$s_!G1E3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png 848w, https://substackcdn.com/image/fetch/$s_!G1E3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png 1272w, https://substackcdn.com/image/fetch/$s_!G1E3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G1E3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png" width="1227" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1227,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83036,&quot;alt&quot;:&quot;TCP server single threaded blocking behavior&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TCP server single threaded blocking behavior" title="TCP server single threaded blocking behavior" srcset="https://substackcdn.com/image/fetch/$s_!G1E3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png 424w, https://substackcdn.com/image/fetch/$s_!G1E3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png 848w, https://substackcdn.com/image/fetch/$s_!G1E3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png 1272w, https://substackcdn.com/image/fetch/$s_!G1E3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8571038a-fb35-4a32-8b5b-a28e1792c9ac_1227x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The server receives the connection request of a thread, reads the message, echoes it back, closes the connection and processes the subsequent thread request.&nbsp;</p><p>In my next post, I&#8217;ve delved into how our server can handle concurrent requests to increase the request throughput.&nbsp;Check it out.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e4e71a5d-6051-4e2d-a603-eb6a8c354e99&quot;,&quot;caption&quot;:&quot;In my previous post, I implemented a bare-bones single-threaded TCP/IP server, helping us understand the intricacies of client-server communication over TCP/IP, in addition to, what goes on inside the server when a client request arrives and the related concepts. In case you haven't read it yet, it's a recommended read before you get on with this post.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Distributed Programming Part 3 - Enabling Our Server to Handle Concurrent Requests By Implementing a Multithreaded TCP/IP Server&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:26351479,&quot;name&quot;:&quot;Shivang Sarawagi&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6fc3e7f-7fb6-4643-a309-abb6386407f5_1536x2048.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-06T08:07:44.730Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb132075c-3eec-460b-8696-6bb308838d44_1362x506.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://shivangsnewsletter.com/p/distributed-programming-part-3&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144353162,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Web Scale&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d35d7e-83e0-445f-99b4-7a2f9f9c37b5_500x500.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p> </p><blockquote><p>Though the code in this post is in Java, you can learn to code distributed systems in the backend programming language of your choice with CodeCrafters (Affiliate).&nbsp;</p><p><a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a> is a platform that helps us code distributed systems like Redis, Docker, Git, a DNS server, and more step-by-step from the bare bones in the programming language of our choice. With their hands-on courses, we not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare our project with the community and then finally navigate the official source code to see how it&#8217;s done. It&#8217;s a headstart to becoming an OSS contributor.</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase. Cheers!</p></blockquote><p>Both the system design/software architecture concepts and the distributed system fundamentals are a vital part of the system design interviews and coding distributed systems.&nbsp;</p><blockquote><p>If you wish to master the fundamentals, check out my&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency learning path</a>, comprising three courses that go through all the concepts starting from zero in an easy-to-understand language. The courses educate you, step by step, on the domain of software architecture, cloud infrastructure and distributed services design.</p></blockquote><p>You can also check out several system design case studies and blog articles that I have written in this&nbsp;<a href="https://shivangsnewsletter.com/">newsletter</a>&nbsp;and&nbsp;<a href="https://scaleyourapp.com/">my blog</a>.&nbsp;</p><p>If you found this newsletter post helpful, consider sharing it with your friends for more reach.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/distributed-programming-part-2?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/distributed-programming-part-2?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>If you are reading the&nbsp;<a href="https://shivangsnewsletter.com/">web version of this post</a>, consider subscribing to get my posts delivered to your inbox as soon as they are published.</p><p>You can get a 50% discount on my courses by sharing my posts with your network. Based on referrals, you can unlock course discounts. Check out the&nbsp;<a href="https://shivangsnewsletter.com/leaderboard">leaderboard page</a>&nbsp;for details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mR1f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mR1f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!mR1f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!mR1f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!mR1f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mR1f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png" width="982" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:982,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mR1f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!mR1f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!mR1f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!mR1f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95f1c76a-ca6c-4301-877a-80e4714186b9_982x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a>&nbsp;as well. I'll see you in the next post. Until then, Cheers!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Kickstart your distributed systems programming and architecture learning with this post]]></title><description><![CDATA[Distributed systems is a term that is used across layers in the compute stack.]]></description><link>https://shivangsnewsletter.com/p/distributed-programming-part-1</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/distributed-programming-part-1</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Sat, 30 Mar 2024 08:17:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!loey!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Distributed systems is a term that is used across layers in the compute stack. For instance, large-scale services like Uber, Netflix, YouTube and the like are called distributed systems, distributed services, or more commonly, web services.</p><p>Services like these, handling millions of concurrent users at any point in time, are highly available, scalable and reliable. Under the covers, they are powered by scalable distributed systems such as Kafka, Redis, MongoDB, Cassandra, and so on, which enable these services to process millions of concurrent users across different cloud regions globally with high throughput, consistency, reliability, and availability.</p><p>Distributed is a term that is applicable at multiple levels. Systems such as Kafka, Redis, etc., are distributed as they run across clusters comprising thousands of nodes deployed globally. They are distributed at the cluster or the infrastructure level.</p><p>On the contrary, consumer-facing services like Netflix, Uber, etc., are called distributed in nature as they comprise thousands of microservices deployed across the globe, running as a whole, achieving the desired functionality.</p><p>Now, let's quickly look at the distributed compute/infrastructure stack, comprising multiple layers.</p><h2>Distributed compute stack&nbsp;</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!loey!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!loey!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png 424w, https://substackcdn.com/image/fetch/$s_!loey!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png 848w, https://substackcdn.com/image/fetch/$s_!loey!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png 1272w, https://substackcdn.com/image/fetch/$s_!loey!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!loey!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png" width="1456" height="732" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:732,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:257742,&quot;alt&quot;:&quot;Distributed compute infrastructure stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Distributed compute infrastructure stack" title="Distributed compute infrastructure stack" srcset="https://substackcdn.com/image/fetch/$s_!loey!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png 424w, https://substackcdn.com/image/fetch/$s_!loey!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png 848w, https://substackcdn.com/image/fetch/$s_!loey!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png 1272w, https://substackcdn.com/image/fetch/$s_!loey!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a1b365-7af7-4b14-912c-bf2685dfdc5f_5282x2656.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The top layer in the stack is the application software layer, which holds the consumer-facing web services like Uber, Spotify, Netflix and so on.</p><p>The application layer is followed by the cluster infrastructure layer that holds systems, powering the above running services, like distributed databases (Redis, MongoDB, Cassandra), message brokers (Kafka), distributed caches (Redis), distributed file systems, cluster coordination systems (ZooKeeper, Etcd.), etc.&nbsp;</p><p>These systems take care of things like <a href="https://shivangsnewsletter.com/p/understanding-database-consistency">distributed data coordination</a> (eventual consistency, strong consistency, etc.), availability, scalability, and so on, without the need for the application programmer to worry too much about the underlying intricacies of how clusters dynamically adjust their size, how low-level memory operations are executed, how the clusters ensure fault tolerance and such.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-ezO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-ezO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png 424w, https://substackcdn.com/image/fetch/$s_!-ezO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png 848w, https://substackcdn.com/image/fetch/$s_!-ezO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png 1272w, https://substackcdn.com/image/fetch/$s_!-ezO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-ezO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png" width="1456" height="829" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:829,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:526356,&quot;alt&quot;:&quot;database consistency&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="database consistency" title="database consistency" srcset="https://substackcdn.com/image/fetch/$s_!-ezO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png 424w, https://substackcdn.com/image/fetch/$s_!-ezO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png 848w, https://substackcdn.com/image/fetch/$s_!-ezO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png 1272w, https://substackcdn.com/image/fetch/$s_!-ezO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faae97177-9a0c-443c-b82e-ed5b1e267b6d_7735x4405.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>To delve into the details of how large-scale services maintain data consistency when deployed across the globe, check out my <a href="https://shivangsnewsletter.com/p/understanding-database-consistency">applying DB consistency in a web service</a> post. </p><p>Furthermore, a case study post on <a href="https://shivangsnewsletter.com/p/distributed-database">distributing our database in different cloud regions globally to manage load &amp; latency</a> is a recommended read.</p></blockquote><p>The cluster infrastructure layer is followed by the platform software layer, which holds the operating system, device drivers, firmware, virtualization setup, and such. This layer provides an abstraction enabling the distributed systems to interact with the underlying hardware they run on.</p><blockquote><p>We can further split the OS and the virtualization layer into two, but for simplicity, we&#8217;ll let it be clubbed, for now. </p><p>If you wish to delve into the specifics of infrastructure, deployment, and virtualization, my two below posts are highly recommended reads: </p><p><a href="https://shivangsnewsletter.com/p/why-doesnt-cloudflare-use-containers">Why doesn't Cloudflare use containers in their Workers platform infrastructure?</a> </p><p><a href="https://shivangsnewsletter.com/p/how-unikraft-cloud-reduces-serverless">How Unikraft Cloud reduces serverless cold starts to milliseconds with unikernels and microVMs</a></p></blockquote><p>The platform software layer is followed by the physical infrastructure layer, which consists of the compute servers mounted in a server rack, physical storage consisting of the hard drives and Flash SSDs, the RAM attached to the compute servers, and the data center network that connects all the physical infrastructure.</p><p>Furthermore, all the layers of the compute stack have a vertical layer applied to them, which is the monitoring layer. This layer contains monitoring code or systems that ensure good service health and continual uptime of components running in all the layers.&nbsp;</p><p>Keeping track of metrics like server uptime, performance, latency, throughput, resource consumption, etc., is crucial for running performant software.&nbsp;This is more commonly known as <a href="https://shivangsnewsletter.com/p/observability-in-distributed-systems">observability</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FrRl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FrRl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png 424w, https://substackcdn.com/image/fetch/$s_!FrRl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png 848w, https://substackcdn.com/image/fetch/$s_!FrRl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png 1272w, https://substackcdn.com/image/fetch/$s_!FrRl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FrRl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png" width="1456" height="879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:879,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:306693,&quot;alt&quot;:&quot;Observability&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Observability" title="Observability" srcset="https://substackcdn.com/image/fetch/$s_!FrRl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png 424w, https://substackcdn.com/image/fetch/$s_!FrRl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png 848w, https://substackcdn.com/image/fetch/$s_!FrRl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png 1272w, https://substackcdn.com/image/fetch/$s_!FrRl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303dd1b0-5992-4f51-959c-2e7dcad14dd1_5892x3558.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I've published a detailed post on it. <a href="https://shivangsnewsletter.com/p/observability-in-distributed-systems">Do check it out</a>.</p><h2>Systems programming</h2><p>Now that we have an idea of the difference between a distributed web service (running at the top of the compute stack) and a distributed system that runs across nodes in a cluster, interacting with the hardware. Let's step into the realm of systems programming.</p><p>Systems programming primarily entails writing performant software that directly interacts with the hardware and makes the most of the underlying resources.</p><p>When we run this software on multiple nodes in a cluster, the system becomes distributed in nature, entailing the application of multiple concepts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2e8z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2e8z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png 424w, https://substackcdn.com/image/fetch/$s_!2e8z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png 848w, https://substackcdn.com/image/fetch/$s_!2e8z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!2e8z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2e8z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png" width="1456" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180363,&quot;alt&quot;:&quot;Node cluster&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Node cluster" title="Node cluster" srcset="https://substackcdn.com/image/fetch/$s_!2e8z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png 424w, https://substackcdn.com/image/fetch/$s_!2e8z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png 848w, https://substackcdn.com/image/fetch/$s_!2e8z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!2e8z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F698e3d67-cffb-4c8d-8053-8e318c005651_2546x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I have started a crash course on <a href="https://shivangsnewsletter.com/p/systems-programming-coding-message-broker">building a distributed message broker like Kafka</a> from the bare bones. Do give it a read to begin with your distributed systems programming learning.</p><p>I am also listing the concepts below that are ideally involved when we implement a distributed system. PS Though I am listing the concepts below for reference, we may not necessarily have to know them all to start coding one for learning.</p><h2>Concepts</h2><h2>Distributed systems fundamentals</h2><p>The nodes of our distributed system communicate over the network and the system should address the complexities of network communication, concurrency, data consistency and fault tolerance.</p><p>This requires knowledge of networking protocols such as TCP/IP, HTTP, serialization mechanisms and message-passing techniques like gRPC, REST, etc.</p><p>To keep the data consistent between the nodes, including keeping the nodes in sync, we need to understand consensus algorithms like Raft and Paxos, as well as the CAP theorem, in addition to understanding the <a href="https://shivangsnewsletter.com/p/understanding-database-consistency">data consistency models</a> and distributed transactions.</p><p>To manage distributed transactions in a scalable and resilient way, distributed services leverage techniques such as distributed logging, event sourcing, state machines, distributed cluster caching, and such.</p><p>To implement high availability, we need to be aware of concepts like redundancy, replication, retries, and failover. For scalability, several load-balancing strategies are leveraged, such as sharding, partitioning, caching, implementing dedicated load balancers, and so on.</p><p>These are foundational distributed systems concepts that are applicable when implementing a scalable, available, reliable, and resilient system.</p><p><a href="https://shivangsnewsletter.com/p/systems-programming-coding-message-broker">In my crash course</a>, I've started with the bare bones of what a cluster node is and continued to code a commit log running on a single node. More posts discussing the distributed scenarios are coming soon.</p><p>To understand the TCP/IP protocol, and things like IP address, ports, and sockets, including how servers function, I've implemented a single-threaded and multi-threaded server in Java. Do check out the below posts:</p><blockquote><p><a href="https://shivangsnewsletter.com/p/distributed-programming-part-2">Implementing a single-threaded blocking TCP/IP server</a> </p><p><a href="https://shivangsnewsletter.com/p/distributed-programming-part-3">Enabling our server to handle concurrent requests by implementing a multi-threaded TCP/IP server</a></p></blockquote><h2>System design and architecture</h2><p>System design and architecture fundamentals go along with the distributed systems fundamentals. These concepts hover more around implementing the services in the application layer at the top of the compute stack we discussed before.</p><p>These are the web architecture and cloud fundamentals, including things like the client-server architecture, basics of message queues, databases, load balancing, microservices, different data consistency models, API design, communication protocols like REST, RPC, etc., concurrent request processing, data modeling, design patterns like the publish-subscribe, event sourcing, saga, circuit-breaker, leader-follower, CQRS, and so on.</p><p>Moreover, this system design and distributed systems fundamentals knowledge is crucial when preparing for our system design interviews.</p><blockquote><p>If you wish to get a grip on the fundamentals starting from zero in a structured and easy-to-understand way, check out my <a href="https://learnsoftwarearchitecture.com/">zero to system architecture learning path</a>, where you will learn the fundamentals of web architecture, cloud, and distributed systems in addition to learning designing scalable services like YouTube, Netflix, ESPN and the like. </p><p>In addition, I've added several system design case studies to my&nbsp;<a href="https://shivangsnewsletter.com/">newsletter</a>. Do give those a read as well.&nbsp;</p></blockquote><h2>Systems programming language</h2><p>Ideally, distributed systems are implemented in programming languages that provide low-level control over hardware resources such as memory, CPU, and I/O. This is key for writing performant systems.</p><p>Popular systems programming languages are C, C++, Rust, Go, Elixir, and Java (to some extent).</p><p>Distributed systems often require concurrency and parallelism support for multitasking. Go and Java provide robust concurrency constructs to handle concurrent scenarios effectively.&nbsp;&nbsp;</p><p>C, C++ and Rust fit best for writing ultra-low latency systems where memory efficiency and direct hardware access are critical, such as databases, caches, web servers, distributed storage, and so on.</p><p>Go has built-in support for concurrency. It also has a garbage collector and offers good performance. Systems such as Kubernetes, Docker, Etcd and Cockroach DB are written in Go.</p><p>Elixir is designed for writing highly concurrent, distributed systems with a focus on fault tolerance and lightweight processes.</p><p>Java offers built-in support for concurrency (supporting both I/O and CPU-intensive applications), provides networking APIs, and has a mature ecosystem. Java code can be deployed across different environments supporting JVM.&nbsp;</p><p>Some of the distributed systems implemented in Java are Apache Hadoop, ZooKeeper, Elasticsearch, Apache Storm, Flink, Cassandra, HBase, and Apache Ignite.</p><p>Kafka is written in Scala, which is a JVM-based language and is interoperable with Java. Many of Kafka's libraries and connectors are written in Java. </p><p>Though performant systems can be implemented in Java, but as I mentioned before, when we need ultra-low latency in our systems, languages such as C, C++, and Rust are preferred. Java is more largely used to implement enterprise services than being leveraged as a systems programming language.</p><p>However, if you just want to learn the underlying concepts for knowledge and not code systems to be used in production, you can code these systems in any programming language of your choice.</p><p>I've implemented the single-threaded and multi-threaded TCP/IP servers in Java and I am using Go to implement the message broker.</p><p>Systems and services can be implemented in any language as long as they fit the requirements. More importantly, understanding the underlying concepts is crucial.</p><blockquote><h2>Practice writing system software with CodeCrafters in the programming language of your choice</h2><p><a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a> is a platform that helps you develop a good concept in writing system software. With their interactive, hands-on exercises, you learn to code distributed systems like Git, Redis, Kafka, and more from the bare bones in the programming language of your choice.</p><p>You can follow my newsletter posts for detailed discussions on systems programming and architecture and for hands-on-practice, you can begin your learning with CodeCrafters to become a better engineer.</p><p>Check out their programming challenges and if you decide to make a purchase, you can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;(affiliate).</p></blockquote><h2>Data structures and algorithms</h2><p>Along with distributed systems fundamentals, we should be aware of the fundamentals of data structures, networking and OS.</p><blockquote><p>Again, as I mentioned before, we need not know everything; we only need to know what is required as we move forward. I am listing the topics for reference.</p></blockquote><p>Data structure and algorithms are crucial for building performant systems. When our systems get hit with traffic in the scale of millions, data structures and algorithms enable our systems to process data and requests efficiently at blazing speed.</p><p>For instance, knowledge of B-trees and hash tables is crucial for implementing efficient data storage and retrieval systems. Merkle trees are leveraged for data synchronization in a distributed environment.</p><p>Bloom filters can help improve lookup times, reduce memory consumption, and optimize query performance in distributed databases and caches.&nbsp;</p><p>Algorithms, such as distributed consensus algorithms, help us manage varied levels of data consistency across the nodes. Consistent hashing is leveraged to distribute and replicate data efficiently across the cluster.</p><p>Techniques like weighted routing and other dynamic routing algorithms help optimize resource utilization, system performance, and so on.</p><p>You get the idea. DSA is foundational in building optimized and performant systems.&nbsp;</p><h2>Networking&nbsp;</h2><p>Knowledge of low-level networking, including socket programming and how network communication happens in a distributed environment, is crucial in implementing distributed systems. Knowing network protocols such as TCP/IP, HTTP, UDP, etc., helps us understand how data is transmitted across machines/nodes in a network.&nbsp;</p><p>Distributed systems at a low level rely on protocols such as TCP/IP to reliably transmit data across a distributed environment, and HTTP we all are aware that the web runs on it. It's the fundamental protocol for client-server communication.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!osZs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!osZs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png 424w, https://substackcdn.com/image/fetch/$s_!osZs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png 848w, https://substackcdn.com/image/fetch/$s_!osZs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png 1272w, https://substackcdn.com/image/fetch/$s_!osZs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!osZs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png" width="1456" height="942" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:942,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:206462,&quot;alt&quot;:&quot;TCP IP model&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="TCP IP model" title="TCP IP model" srcset="https://substackcdn.com/image/fetch/$s_!osZs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png 424w, https://substackcdn.com/image/fetch/$s_!osZs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png 848w, https://substackcdn.com/image/fetch/$s_!osZs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png 1272w, https://substackcdn.com/image/fetch/$s_!osZs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12e3127b-2eed-4971-8dca-edab5e93eedb_2916x1886.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Furthermore, we should understand the network topologies that enable us to create scalable software capable of functioning with optimized network latency and minimal communication overhead across different cloud regions and availability zones globally.&nbsp;</p><h2>Operating systems&nbsp;</h2><p>Understanding memory management concepts such as virtual memory, paging, memory allocation, resource management, inter-process communication and scheduling algorithms helps optimize memory usage and task execution in distributed systems.&nbsp;</p><p>Knowledge of processes, threads, locks, semaphores, monitors, OS user space, kernel space, etc., enables us to code efficient systems, ensuring optimum resource utilization. This subsequently augments our systems' throughput and responsiveness.&nbsp;</p><p>For instance, Kafka's performance relies on the implementation of the zero-copy optimization principle. It bypasses the file system cache and interacts directly with the disk for read writes. This approach minimizes latency and maximizes throughput by reducing unnecessary overhead associated with file system caching.&nbsp;</p><p>You'll find the specifics on this in my newsletter as I delve deeper into the message broker crash course. Stay tuned.</p><p>Furthermore, understanding file systems helps us code systems intended to handle large-scale data storage and retrieval like Google BigTable, Hadoop, etc.</p><blockquote><p>I actively share new insights on the domain on my social media handles. Do follow me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a> and&nbsp;<a href="https://twitter.com/shivang_z">X</a> to receive insights and updates in your home feed.</p><p>Additionally, you can also subscribe to this newsletter to receive the content in your inbox as soon as it is published.</p></blockquote><h2>Deployment &amp; Cloud</h2><p>Finally, our software needs to be deployed in a distributed environment to test its functioning. This may require the use of technologies such as Docker, Kubernetes and possibly a cloud platform like AWS, GCP or Azure. This requires bare-bones knowledge of the cloud.</p><blockquote><p>If you are hazy on the cloud fundamentals, my&nbsp;<a href="https://learnsoftwarearchitecture.com/">zero to system architecture learning path</a> has got you covered. It contains a dedicated course that covers <a href="https://learnsoftwarearchitecture.com/cloud-computing-101-master-the-fundamentals">cloud fundamentals</a>, including concepts like serverless, running complex workflows, vendor lock-in, cloud infrastructure, how large-scale services are deployed across different cloud regions and availability zones globally, deployment workflow and the associated infrastructure and technologies.</p></blockquote><p>Now that we've discussed the list of topics we need to be aware of to implement a distributed system, you can proceed with the <a href="https://shivangsnewsletter.com/p/systems-programming-coding-message-broker">first post on the message broker crash course</a>.</p><p>If you found this post insightful, do share it with your friends for more reach. You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>.</p><p>I'll see you in the next post.</p><p>Until then, Cheers!</p>]]></content:encoded></item><item><title><![CDATA[System Design Case Study #5: Serverless Compute & Storage At the Edge With Stateless & Stateful Functions]]></title><description><![CDATA[Picture a scenario where we need to set up the inventory management infrastructure globally across different cloud regions for a massive sports apparel company.]]></description><link>https://shivangsnewsletter.com/p/serverless</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/serverless</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Sun, 04 Feb 2024 11:04:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!lXFt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Picture a scenario where we need to set up the inventory management infrastructure globally across different cloud regions for a massive sports apparel company.&nbsp;</p><p>The business operates with a localized approach where the products are tailored based on local preferences and demands based on the respective cloud region. Since the products are consumed locally, it's a good idea to manage the inventory locally in different cloud regions as opposed to streaming the data to a central cloud region from respective cloud regions.</p><p>This means we need cloud region-specific deployments to manage the local inventory. However, we may also need to stream some data to the central cloud region for overall aggregated inventory analytics and such.&nbsp;</p><p>I have discussed independent cloud region-specific deployments in my earlier post: <a href="https://shivangsnewsletter.com/p/distributed-database">Distributing our database in different cloud regions globally to manage load &amp; latency</a>. If you haven't read it, it's a recommended read.&nbsp;</p><p>This post focuses on serverless deployments and the intricacies involved in managing state in event-driven serverless services.&nbsp;</p><p>To begin with, why do we even need serverless compute and the serverless database? Why not move forward with a conventional API-driven backend?</p><h2>Why Serverless Compute &amp; Serverless Database In Cloud-Region Specific Deployments</h2><p>Because our use case is event-driven. Only when a product is added or updated in the inventory, a set of operations get triggered. We do not have to run our servers all the time. This will save us significant compute costs.</p><p>For instance, when a new product is added to our inventory, it triggers a set of operations like the <a href="https://shivangsnewsletter.com/p/image-processing-pipeline">product image goes into the S3 object store</a>, the description, price, count, etc. is processed by the serverless function, the system may further resize the image, extract metadata and so on. Eventually, the product data is stored in the local cloud-region serverless database.&nbsp;</p><p>If it weren't for the serverless, we would have to run our servers continually, which would require ongoing infrastructure provisioning and management, in addition to the associated idle server running costs.</p><p>Serverless infrastructure enables us to focus on code and on implementing business logic as opposed to worrying about scaling and managing the underlying infrastructure. Developers just write code and run it on serverless functions without worrying about how the backend will scale when subjected to rising traffic.&nbsp;</p><p>With serverless functions, the compute only runs when required, i.e., when events get triggered. Moreover, our current use case does not require the backend to store any user state. The product upload process is mostly stateless. All this fits serverless functions and the serverless database well with our use case.&nbsp;</p><p>Now, let's look at our serverless architecture.</p><h2>Serverless Architecture</h2><p>The new product addition to the inventory event is handled by a serverless function, which then triggers the serverless database for the product data to be stored.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lXFt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lXFt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png 424w, https://substackcdn.com/image/fetch/$s_!lXFt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png 848w, https://substackcdn.com/image/fetch/$s_!lXFt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png 1272w, https://substackcdn.com/image/fetch/$s_!lXFt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lXFt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1455773,&quot;alt&quot;:&quot;Serverless Architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Serverless Architecture" title="Serverless Architecture" srcset="https://substackcdn.com/image/fetch/$s_!lXFt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png 424w, https://substackcdn.com/image/fetch/$s_!lXFt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png 848w, https://substackcdn.com/image/fetch/$s_!lXFt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png 1272w, https://substackcdn.com/image/fetch/$s_!lXFt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac7bdbc-c2d4-4fd8-b53a-b530a09651b9_6737x3788.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Local serverless databases at respective cloud regions help us keep track of product availability, which is helpful in managing restocking and optimizing the retail store shelves.&nbsp;</p><p>Also, as new products are added, updated and removed from the local serverless databases, certain data is streamed to the central cloud region asynchronously or synchronously for overall analytics.&nbsp;</p><p>The central database may contain the master inventory dataset, giving an overview of product availability in all the global stores and a centralized view of the entire inventory. This helps the business run analytics for demand forecasting and other related scenarios to understand the whole retail chain.&nbsp;</p><p>Cloud providers often charge for data transfer across regions, so we need to keep that in mind when designing our architecture, as well. This will be a key aspect in ascertaining what data goes into the central cloud region and what stays in local cloud region databases.&nbsp;</p><h2>New System Requirements</h2><p>The above serverless architecture enabled the production warehouses to update the product data in the inventory system. This gave the company insights into its inventory in real-time.&nbsp;</p><p>Now, we have a new requirement. Since all the inventory data is stored in local serverless databases, we intend to make it the single source of truth for the inventory information.&nbsp;</p><p>We need to integrate the inventory management system with the retail shops' IoT devices/systems as well. Whenever a purchase transaction happens in a local retail shop, the shop's system will trigger an event to be handled by the retail transaction serverless function and the function will update the product count in the same inventory serverless database.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hDYg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hDYg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png 424w, https://substackcdn.com/image/fetch/$s_!hDYg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png 848w, https://substackcdn.com/image/fetch/$s_!hDYg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png 1272w, https://substackcdn.com/image/fetch/$s_!hDYg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hDYg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png" width="1456" height="843" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:843,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1013656,&quot;alt&quot;:&quot;Serverless&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Serverless" title="Serverless" srcset="https://substackcdn.com/image/fetch/$s_!hDYg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png 424w, https://substackcdn.com/image/fetch/$s_!hDYg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png 848w, https://substackcdn.com/image/fetch/$s_!hDYg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png 1272w, https://substackcdn.com/image/fetch/$s_!hDYg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97db4bd-d260-444d-8a02-87e3ccb99455_5815x3367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This way, all the product purchases from the retail stores in specific cloud regions will update the available inventory count in the same inventory database, ensuring a consistent inventory. We do not have to setup a separate retail transaction inventory database and sync it with our production warehouse inventory database in real-time. This will keep things simple.</p><p>Our initial production warehouse to product inventory update use case was stateless, but in the retail purchase scenario, we need to store state with the serverless functions on the backend for efficient processing. For this, we would need stateful serverless functions.</p><p>Let's look at our retail transaction serverless architecture.</p><h2>Retail Transaction Serverless Architecture</h2><p>Our retail stores are equipped with IOT devices that trigger specific events for product sales, updating customer data, etc.</p><p>These events are handled by specific serverless functions on the backend. In these events, we need to store some state as well with the functions for efficient processing. This would help with managing long-lived transactions, managing orders based on customer preferences and such. Storing state with the serverless functions helps reduce the load on the database as well which improves application latency.</p><p>In case you are hazy on what application state is, check out the detailed post on application state - <a href="https://scaleyourapp.com/stateless-and-stateful-services/">A discussion on stateless &amp; stateful services (Managing user state on the backend)</a>, on my blog.</p><p>A general notion is serverless is stateless and best suits stateless use cases. But we can store state with them with stateful serverless functions as well.&nbsp;</p><p>Besides the inventory database, the retail store backend will have a separate database to manage the customer data, purchases and other information. In addition, synchronously or asynchronously, this information will be streamed to the central cloud region to enable the business to gauge its regional performance and customer behavior in different regions and compute overall analytics.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I9RW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I9RW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png 424w, https://substackcdn.com/image/fetch/$s_!I9RW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png 848w, https://substackcdn.com/image/fetch/$s_!I9RW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png 1272w, https://substackcdn.com/image/fetch/$s_!I9RW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I9RW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png" width="1456" height="843" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:843,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1169541,&quot;alt&quot;:&quot;Stateful functions&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Stateful functions" title="Stateful functions" srcset="https://substackcdn.com/image/fetch/$s_!I9RW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png 424w, https://substackcdn.com/image/fetch/$s_!I9RW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png 848w, https://substackcdn.com/image/fetch/$s_!I9RW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png 1272w, https://substackcdn.com/image/fetch/$s_!I9RW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4168e050-1fdd-4709-aefa-5650370fce77_5815x3367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Separate Stateless &amp; Stateful Serverless Functions</h2><p>For the product updates from the production warehouse, we will have stateless serverless functions handling the events. For the retail product purchases, we will have stateful serverless functions handling the events.</p><p>The stateless functions will handle the stateless tasks like product addition or updation and will be short-lived. The stateful functions will manage the long-lived state spanning multiple requests and complex business logic.&nbsp;</p><p>Both types of serverless functions will have separate code and deployments since the use cases are different and the cloud platform can scale and optimize their performance separately based on the load.&nbsp;</p><p>Now, let's look into stateful functions in detail to understand them better.</p><h2>Stateful Serverless Functions&nbsp;</h2><p>Stateful functions ideally maintain the state across requests by leveraging external storage like a latency-optimized database or a key-value store. These stores are optimized for quick reads and writes focused on low-latency access to the state information.&nbsp;</p><p>In addition, some serverless cloud products provide in-memory storage as well for functions to store state during a request. This averts the need to interact with the database every now and then for state info, thus improving system efficiency.</p><p>So, to manage state across requests, we have external storage, and to manage state temporarily within the span of a request, we have in-memory storage.&nbsp;</p><p>Some serverless solutions have built-in storage with serverless functions as well to manage the state, offering them an efficient way to work with the state.&nbsp;</p><p>The storage contains some form of unique request attribute, such as client ID, authentication token, etc. that enables the serverless platform to recognize requests from the same client, thus associating the request with the stored state.&nbsp;</p><p>Stateful serverless functions are an extension to the existing serverless infrastructure, enabling them to run orchestrated workflows, work across multiple requests, and so on. For instance, Apache Flink stateful functions are built on top of Apache Flink and leverage several Flink's features. For consistent state guarantees, including fault-tolerance and scalability, they use the technique of co-location of state and messaging in the cluster, inherently used by Apache Flink.&nbsp;</p><p>With Flink's stateful functions, the state is managed by the stateful functions themselves and shared with other stateful functions when required. Devs do not have to worry about manually storing the state in an external database. Flink handles this for us.</p><p>Similarly, Azure Durable stateful functions are an extension to Azure Functions. The state durability of stateful functions enables devs to run complex workflows, long-running tasks, etc.&nbsp;</p><p>We learned that state can be added to serverless functions via external storage but why add state to a serverless function which is built to be inherently stateless? Again, why not deploy a conventional API-driven backend instead?</p><h2>Serverless vs. Conventional API-driven Backend&nbsp;</h2><p>Picking a conventional API-driven backend or a serverless architecture largely depends upon the business use case, application characteristics, requirements, and complexity. Distributed system design is nuanced. There are so many factors and trade-offs to consider when designing a scalable and available system.&nbsp;</p><p>However, in our case, the first reason to pick the serverless architecture and then adding state to the functions is we have an event-driven architecture. Our service does not run all the time, like in any other use case, for instance, a video streaming service where the users are active on the website most of the time and we have to keep the servers running.</p><p>Our stateful serverless functions run only when a retail purchase happens. So, we are saving money on the servers not running idle most of the time, plus we do not have to invest resources in managing the infrastructure and optimizing it to scale with the increased load. Everything is handled by the cloud provider. There is no operational overhead.</p><p>Additionally, stateful serverless fits well with short-lived stateful computations. In contrast, if the computation were long-running, complex and less event-driven, a conventional API backend would be a better fit.&nbsp;</p><p>An example of this would be web sockets implementation with a persistent connection to the backend. This sort of use case would be challenging to implement with serverless functions.&nbsp;</p><p>However, at the same time, it's essential to know that since the serverless infrastructure is managed by the cloud provider, we are vendor locked-in with them. Adapting our code to the new complex requirements may get tricky to the point of rewriting several modules of the service from the bare bones since we have minimal control over the infrastructure.&nbsp;</p><p>The best bet for deciding the right architecture and technology stack for our use case is to do a POC (Proof Of Concept) and benchmark it.</p><p>I wrote a blog article earlier on how <a href="https://scaleyourapp.com/how-discord-scaled-their-member-update-feature/">Discord scaled their member update feature benchmarking different data structures</a>. It is a good read. Check it out.&nbsp;</p><blockquote><p>If you wish to learn the fundamentals of distributed system design, including concepts like serverless, running complex workflows, vendor lock-in, cloud infrastructure, how large-scale services are deployed across different cloud regions and availability zones globally, fundamentals of web architecture, how to pick the right technology for your use case and more, check out the&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency</a>&nbsp;learning path.&nbsp;</p><p>It's a series of three courses authored by me intended to help you master the fundamentals and the intricacies of designing distributed systems like ESPN, Netflix, YouTube, and more.</p></blockquote><blockquote><p>Additionally, if you wish to learn to code distributed systems from the bare bones, I am running a series on it in this newsletter. <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">Do check it out here</a>.&nbsp;</p><p>If you wish to practice coding distributed systems like Redis, Docker, Git, a DNS server and more from the bare bones in the programming language of your choice, check out&nbsp;<a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>&nbsp;(Affiliate). With their hands-on courses, you not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare your project with the community and then finally navigate the official source code to see how it&#8217;s done.&nbsp;</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase.</p></blockquote><p>If you found this newsletter post helpful, consider sharing it with your friends for more reach.&nbsp;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/serverless?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/serverless?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>If you are reading the <a href="https://shivangsnewsletter.com/">web version of this post</a>, consider subscribing to get my posts delivered to your inbox as soon as they are published.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p><p>You&#8217;ll find the previous system design case study here:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;939f8fa3-77f6-4581-be15-3abdab0288b8&quot;,&quot;caption&quot;:&quot;Shopify leverages DB replication for redundancy and failure recovery, in addition to setting up read replicas as an alternative read-only data source for read operations. This reduces the read load on their primary database nodes as the read requests can be routed to the read replicas and the primary nodes can have more bandwidth to handle more write op&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;System Design Case Study #4: How Shopify Implemented Read Consistency Across Their Database Replicas For A Consistent User Experience&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:26351479,&quot;name&quot;:&quot;Shivang Sarawagi&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/770fbb16-c4a1-4351-9ca6-e1381dff7dc1_1536x2048.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-01-27T07:54:50.483Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://shivangsnewsletter.com/p/system-design-case-study-4-how-shopify&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:141064021,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Web Scale&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770fbb16-c4a1-4351-9ca6-e1381dff7dc1_1536x2048.jpeg&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p> You can get a 50% discount on my courses by sharing my posts with your network. Based on referrals, you can unlock course discounts. Check out the&nbsp;<a href="https://shivangsnewsletter.com/leaderboard">leaderboard page</a>&nbsp;for details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1uyD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1uyD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!1uyD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!1uyD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!1uyD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1uyD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png" width="982" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:982,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1uyD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!1uyD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!1uyD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!1uyD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F471b1402-1078-4633-858d-5d1ad4a9d51e_982x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a>&nbsp;as well. I'll see you in the next post. Until then, Cheers!</p>]]></content:encoded></item><item><title><![CDATA[System Design Case Study #4: How Shopify Implemented Read Consistency Across Their Database Replicas For A Consistent User Experience]]></title><description><![CDATA[Shopify leverages DB replication for redundancy and failure recovery, in addition to setting up read replicas as an alternative read-only data source for read operations.]]></description><link>https://shivangsnewsletter.com/p/system-design-case-study-4-how-shopify</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/system-design-case-study-4-how-shopify</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Sat, 27 Jan 2024 07:54:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!yc5W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Shopify leverages DB replication for redundancy and failure recovery, in addition to setting up read replicas as an alternative read-only data source for read operations. This reduces the read load on their primary database nodes as the read requests can be routed to the read replicas and the primary nodes can have more bandwidth to handle more write operations. This enhances the system's request throughput and overall performance.</p><p>Master nodes and the read replicas setup works great, but having a number of read replicas working along with the primary nodes introduces some replication lag in the system since syncing writes to all the read replicas requires some time.</p><p>As the replicas are being synched, requests fetching data from the read replicas might get inconsistent data since, at one point, all read replicas might not be strongly consistent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yc5W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yc5W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png 424w, https://substackcdn.com/image/fetch/$s_!yc5W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png 848w, https://substackcdn.com/image/fetch/$s_!yc5W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png 1272w, https://substackcdn.com/image/fetch/$s_!yc5W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yc5W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png" width="1456" height="829" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:829,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:632042,&quot;alt&quot;:&quot;Read your write consistency&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Read your write consistency" title="Read your write consistency" srcset="https://substackcdn.com/image/fetch/$s_!yc5W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png 424w, https://substackcdn.com/image/fetch/$s_!yc5W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png 848w, https://substackcdn.com/image/fetch/$s_!yc5W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png 1272w, https://substackcdn.com/image/fetch/$s_!yc5W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cab8e48-a7df-4b5b-94c9-a20017757af9_5206x2964.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Please go through my former newsletter posts for more context. I've delved deep into data consistency and data distribution across different cloud regions:</p><p>1. <a href="https://shivangsnewsletter.com/p/distributed-database">Distributing Our Database In Different Cloud Regions Globally To Manage Load &amp; Latency.</a></p><p>2. <a href="https://shivangsnewsletter.com/p/understanding-database-consistency">Understanding Database Consistency Levels And Applying Them To A Single Web Service.</a></p><p>Since the distributed read replicas are not strongly consistent, in a scenario where data for a certain query is fetched from multiple replicas, assembled and returned to the users, the results of these queries fired by multiple clients will be highly unpredictable.&nbsp;</p><p>And what if, based on these results, the users perform further writes in the application? This will cause the system to be in a highly inconsistent state, leaving the end user confused and our support team with an inbox cluttered with customer whines.</p><p>To avert this scenario, Shopify needed a consistent system. They contemplated <a href="https://shivangsnewsletter.com/p/understanding-database-consistency">different consistency models</a>, such as strong consistency, causal consistency, monotonic read consistency, etc., based on their use case.&nbsp;</p><h2>Strong Consistency&nbsp;</h2><p>Strong consistency suited best to have a strongly consistent system without the user seeing subtle variations in data. But it also meant synchronizing all the read replicas synchronously after a certain write was performed on the master node.&nbsp;</p><p>If the synchronization failed or had an issue, the write was unsuccessful. This consistency model wasn't the best choice since the synchronous updation of the replicas added additional latency to the write operations, undesirably negatively affecting the system's performance.&nbsp;</p><h2>Causal Consistency&nbsp;</h2><p>They next considered casual consistency, implementing it via GTID (Global Transaction Identifier). Every write on the primary server would have a unique GTID that would be replicated across the replicas. And based on the presence of the specific GTID, the system would ascertain the staleness of the data on a certain node by comparing it with the state on the primary node.&nbsp;</p><p>The issue with this approach was running additional logic on all the replicas that would stream data to a certain component of the system that would decide which DB node to route a certain query actively comparing the states of different read replicas and the master nodes. This was clearly additional overhead and the level of complexity was unnecessary.&nbsp;</p><h2>Monotonic Read Consistency&nbsp;</h2><p>They finally settled on monotonic read consistency based on their use case and with acceptable trade-offs. Monotonic consistency enabled the system to provide a consistent timeline of reads, though not in real-time.</p><p>The key to implementing this consistency was to route the client requests to the same DB node for the subsequent queries. When a certain client always hit the same DB node, they would always get consistent data wrt that node.&nbsp;</p><p>Initially, when the client sent a request to the database cluster, the system, with some considerations like latency, load balancing, etc, randomly sent the request to the DB nodes.</p><p>But now, since the requests had to be routed to the same node for the subsequent reads, the logic of routing requests had to be tweaked.&nbsp;</p><h2>Adding A Unique Identifier To Every Client Request</h2><p>To achieve this, Shopify added a UUID (Universally Unique Identifier) to every request. This ID is passed with every client request to enable the system to identify a series of requests from a certain client.</p><blockquote><p>UUIDs are leveraged in distributed systems to uniquely identify information in a distributed computing environment. With UUIDs, we can ensure the information is globally unique, and it's highly unlikely that two UUIDs will be the same.&nbsp;</p><p>In distributed systems, they are used in multiple use cases, such as keys for distributed DB records, generating session IDs, security tokens, uniquely identifying messages, transactions, traces, and so on.&nbsp;</p></blockquote><p>Shopify tagged the request UUID with every request and sent it within the query comments as a key-value pair:</p><pre><code>/* consistent_read_id:&lt;some unique ID&gt; */ SELECT &lt;fields&gt; FROM &lt;table&gt;</code></pre><p>Why the UUID is passed in the query comments as opposed to being a part of the query is not explained in their <a href="https://shopify.engineering/read-consistency-database-replicas">engineering post</a>. Maybe, based on their system, it would be easier to parse and extract the UUID from the comments or the comments could hold more metadata, such as user IDs, timestamps, etc., when required in the near future.&nbsp;</p><h2>Hashing The UUID And Routing the Client Requests To The Same DB Node</h2><p>To confirm that a client with requests with a certain UUID always hit the same DB node for read consistency, the UUID (consistent read ID for every client request) is hashed to get an integer value. A modulo operation is performed on this integer using the number of nodes as the divisor to get the DB node the request will hit. </p><p>The result of the modulo is the DB node index that the request with that specific UUID will hit. This is how Shopify ensured consistent reads across their database replicas.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Quco!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Quco!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png 424w, https://substackcdn.com/image/fetch/$s_!Quco!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png 848w, https://substackcdn.com/image/fetch/$s_!Quco!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png 1272w, https://substackcdn.com/image/fetch/$s_!Quco!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Quco!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png" width="958" height="457" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:457,&quot;width&quot;:958,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Hash-based routing&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Hash-based routing" title="Hash-based routing" srcset="https://substackcdn.com/image/fetch/$s_!Quco!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png 424w, https://substackcdn.com/image/fetch/$s_!Quco!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png 848w, https://substackcdn.com/image/fetch/$s_!Quco!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png 1272w, https://substackcdn.com/image/fetch/$s_!Quco!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9f582a9-727b-4920-bc0a-1c78f60b360f_958x457.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Source: <a href="https://shopify.engineering/read-consistency-database-replicas">Shopify Engineering</a></p><h2>Key System Design Learning From this Case Study</h2><p>This post focuses on how Shopify contemplated different tradeoffs across different consistency levels, picked the monotonic read consistency as it fit their business requirements and eventually implemented it.&nbsp;</p><p>To ensure the clients' requests with unique IDs hit the same DB nodes for subsequent requests, they leveraged a technique called hash-based routing or hash-based load balancing, as I discussed above.&nbsp;</p><h2>Hash-based Routing/Load Balancing&nbsp;</h2><p>Hash-based routing ensures even distribution of incoming requests across a cluster of servers, in addition to routing data or requests to the same server nodes in a deterministic fashion. The process is the same as what Shopify implemented in their system, ensuring read consistency by routing client requests with unique IDs to the same servers.</p><p>It involves feeding a certain unique request attribute into the hash function to generate a unique hash, which is then mapped to a certain server by taking the module of the hash with the value of the total number of servers.&nbsp;</p><p>The hash function that is chosen is deterministic in nature and always produces the same hash value for the same request attribute, ensuring that the request always gets routed to the same server.&nbsp;In addition, the even distribution of requests across the cluster depends on the hash function as well.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u5l2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u5l2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png 424w, https://substackcdn.com/image/fetch/$s_!u5l2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png 848w, https://substackcdn.com/image/fetch/$s_!u5l2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png 1272w, https://substackcdn.com/image/fetch/$s_!u5l2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u5l2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png" width="1456" height="852" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:852,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:772412,&quot;alt&quot;:&quot;Hash-based routing&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Hash-based routing" title="Hash-based routing" srcset="https://substackcdn.com/image/fetch/$s_!u5l2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png 424w, https://substackcdn.com/image/fetch/$s_!u5l2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png 848w, https://substackcdn.com/image/fetch/$s_!u5l2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png 1272w, https://substackcdn.com/image/fetch/$s_!u5l2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b91812-0780-41ea-9789-3263e91a36bd_5206x3045.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Load balancers often leverage this technique to implement sticky sessions. Sticky sessions ensure that for a certain client session, all the requests are routed to the same server. The unique request attribute here is the session ID that gets routed to the same server holding the session state for that particular client, including the subsequent requests from the same client.</p><p>The Hash-based routing technique works well to route the requests to the same servers but has an issue if the servers get added or removed from the cluster dynamically. In this scenario, the system will route the request to a different server if a certain server that was supposed to receive a particular request goes down.</p><p>When the nodes get updated dynamically, the modulo operation has to recalculate the complete request mapping, creating a lot of confusion in the existing system state. And this happens continually as the servers get updated in real-time.&nbsp;</p><p>To tackle this, we leverage another hashing technique called consistent hashing.&nbsp;</p><h2>Consistent Hashing&nbsp;</h2><p>Consistent hashing is another hashing technique that distributes requests or data across a cluster of servers, minimizing the need for remapping when the number of servers changes dynamically. This provides more stability to request routing in a dynamic environment.&nbsp;</p><p>This is achieved by arranging the servers in a virtual ring called the hash ring. Servers are arranged or mapped on the hash ring by feeding their unique identifier (it can be their network address or another distinctive attribute) into a hash function.&nbsp;</p><p>The module of the generated hash value is taken with the total number of positions or slots in the ring to map it onto the hash ring. The position of each server/node on the ring is determined by its hash value. The process of calculating the server index via modulo is the same as the hashing-based routing approach. The difference is that in this scenario, both the nodes and the requests are hashed and mapped onto the ring.</p><p>The total number of positions or slots in the ring is based on the design considerations of our consistent hashing implementation. A higher number of positions ensures a fine-grained distribution, but this may have compute overhead during the lookup process. In contrast, if the number of positions is less, the computations will be less, but the load balancing might be uneven.&nbsp;</p><p>One common approach is to set the number of positions based on the number of server nodes available. So, for instance, if we have 30 servers in our system, we might have the total number of positions as a multiple of three times the number of servers; that would be 90 positions/slots.&nbsp;</p><p>It's more about experimentation and evaluation during implementation and seeing what fits best.</p><p>Similar to how the nodes are mapped onto the hash ring, the unique attribute of the client request is fed into the hash function to generate a hash. This hash value is then mapped onto the same hash ring using the same modulo approach.</p><p>When the client sends a request, the system maps the request to the server by finding the position on the ring where the hash value of the request falls and then moving clockwise and locating the nearest server in the ring.&nbsp;<br><br>So, as opposed to a request directly hitting a certain DB node based on the modulo output, like in hash-based routing, in consistent hashing, it's more about locating the hash location in the ring and finding the nearest servers in the ring. This decouples the requests and the DB nodes, making the system more dynamic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rsOh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rsOh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png 424w, https://substackcdn.com/image/fetch/$s_!rsOh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png 848w, https://substackcdn.com/image/fetch/$s_!rsOh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png 1272w, https://substackcdn.com/image/fetch/$s_!rsOh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rsOh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png" width="1456" height="808" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:808,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:905901,&quot;alt&quot;:&quot;Consistent Hashing&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Consistent Hashing" title="Consistent Hashing" srcset="https://substackcdn.com/image/fetch/$s_!rsOh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png 424w, https://substackcdn.com/image/fetch/$s_!rsOh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png 848w, https://substackcdn.com/image/fetch/$s_!rsOh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png 1272w, https://substackcdn.com/image/fetch/$s_!rsOh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe38dd3e7-6475-4646-845d-a5f420ba15d2_5815x3228.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When a new node is added or removed, the system adjusts the position of the updated node in the ring using the same hashing approach without the need to remap all the servers in the cluster. Most requests are still routed to the same nodes.&nbsp;</p><p>In real-world systems, the servers involved in consistent hashing are monitored in real-time to avoid any uneven distribution of load. If the issue exists, the ring is reconfigured for a more even distribution. Moreover, it is essential that the hash function that is chosen is deterministic and evenly distributes hash values across the ring.</p><p>The upside of this approach, in contrast to hash-based routing, is that it adapts to the cluster changes, which is critical in running real-world systems.&nbsp;</p><p>Hash-based routing that Shopify leveraged fits well in static environments where the number of servers is ideally static or doesn't change much often.&nbsp;</p><h2>Consistent Hashing In Distributed Systems&nbsp;</h2><p>Consistent hashing is extensively used in real-world distributed systems. Slack leverages it in its messaging architecture to route channel messages to different nodes in the cluster. Each channel has a unique ID, which is used as a key to hash to assign the channel to a unique node in the cluster. This ensures the messages for a given channel are handled by the same server, ensuring consistency and minimal network consumption.</p><p>I've done a case study on <a href="https://scaleyourapp.com/system-design-case-study-real-time-messaging-architecture/">Slack's real-time messaging architecture on my blog here</a>.&nbsp;</p><p>Consistent hashing is leveraged in several distributed systems use cases, such as balancing load across a cluster (we've already discussed this), data partitioning (data can be partitioned across multiple nodes in a distributed database based on consistent hashing), similarly cache data can be evenly distributed across the nodes of a distributed cache, same for CDN data distributed across the edge servers, task assignment in parallel processing and so on.&nbsp;</p><p>Here are a couple more real-world use cases<strong>:</strong>&nbsp;<a href="https://engineering.atspotify.com/2013/02/in-praise-of-boring-technology/">Spotify leverages consistent hashing</a> to distribute the load of millions of users across their servers. <a href="https://discord.com/blog/how-discord-scaled-elixir-to-5-000-000-concurrent-users">Discord leverages it to scale to millions of users</a>. In addition, distributed systems such as Couchbase, Riak, DynamoDB, Cassandra and many more leverage the technique to partition data across their distributed nodes.</p><blockquote><p>If you wish to delve into the fundamentals of how distributed services work, how large-scale services manage their database growth, how they deal with global concurrent traffic and distributed data conflicts, including how they are deployed globally and much more, check out the&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency</a>&nbsp;learning path.&nbsp;</p><p>It's a series of three courses authored by me intended to help you master the fundamentals and the intricacies of designing distributed systems like ESPN, Netflix, YouTube, and more.</p></blockquote><blockquote><p>Additionally, if you wish to learn to code distributed systems from the bare bones, I am running a series on it in this newsletter. <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">Do check it out here</a>.&nbsp;</p><p>If you wish to practice coding distributed systems like Redis, Docker, Git, a DNS server and more from the bare bones in the programming language of your choice, check out&nbsp;<a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>&nbsp;(Affiliate). With their hands-on courses, you not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare your project with the community and then finally navigate the official source code to see how it&#8217;s done.&nbsp;</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase.</p></blockquote><p>If you found this newsletter post helpful, consider sharing it with your friends for more reach. If you are reading the <a href="https://shivangsnewsletter.com/">web version of this post</a>, consider subscribing to get my posts delivered to your inbox as soon as they are published.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/system-design-case-study-4-how-shopify?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/system-design-case-study-4-how-shopify?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Implementing consistency via GTID (Global Transaction Identifier) is another interesting topic I'll be delving into in my future posts. It's been added to my list.</p><p>You&#8217;ll find the previous system design case study here:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;60f7ced3-d83d-473a-a936-44253c9488e8&quot;,&quot;caption&quot;:&quot;Picture a scenario where we launch an online multiplayer card game based on a regional fictional character. The game enables players to trade cards, explore new and unique cards in the system, purchase them, participate in a battle royale mode, and so on.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;System Design Case Study#3: Distributing Our Database In Different Cloud Regions Globally To Manage Load &amp; Latency&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:26351479,&quot;name&quot;:&quot;Shivang Sarawagi&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/770fbb16-c4a1-4351-9ca6-e1381dff7dc1_1536x2048.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-01-14T13:56:32.666Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://shivangsnewsletter.com/p/distributed-database&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:140668560,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Web Scale&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770fbb16-c4a1-4351-9ca6-e1381dff7dc1_1536x2048.jpeg&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>You can get a 50% discount on my courses by sharing my posts with your network. Based on referrals, you can unlock course discounts. Check out the&nbsp;<a href="https://shivangsnewsletter.com/leaderboard">leaderboard page</a>&nbsp;for details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Lz4G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lz4G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!Lz4G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!Lz4G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!Lz4G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lz4G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png" width="982" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:982,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Lz4G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!Lz4G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!Lz4G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!Lz4G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76787dba-b2e4-44c9-b8f7-953151a6d0d7_982x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a>&nbsp;as well. I'll see you in the next post. Until then, Cheers!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Understanding Database Consistency Levels And Applying Them To A Single Web Service]]></title><description><![CDATA[In my former newsletter post, I discussed how we can distribute our database across different cloud regions to manage the load and latency of our service when being hit with concurrent global traffic in the scale of millions.]]></description><link>https://shivangsnewsletter.com/p/understanding-database-consistency</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/understanding-database-consistency</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Sat, 20 Jan 2024 14:46:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1yAm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In my former newsletter post, I discussed how we can <a href="https://shivangsnewsletter.com/p/distributed-database">distribute our database across different cloud regions</a> to manage the load and latency of our service when being hit with concurrent global traffic in the scale of millions.&nbsp;</p><p>When the database nodes are spread across different cloud regions and availability zones across the globe, a trade-off arises between consistency and availability (CAP Theorem) and low latency and consistency (<a href="https://en.wikipedia.org/wiki/PACELC_theorem">PACELC theorem</a>: an extension to the CAP Theorem).</p><p>In this post, I will discuss different database consistency levels with their application in a real-world messaging service like Discord.&nbsp;</p><p></p><h2>Database Consistency Levels&nbsp;</h2><p>Two standard consistency levels that primarily all distributed databases support are Strong and Eventual consistency. Besides these two, some databases offer other tunable consistency levels, such as causal consistency, read-your-writes consistency, monotonic read consistency, monotonic write consistency, session consistency, etc.&nbsp;</p><p>I'll begin the discussion with Strong and Eventual consistency and will then get into other consistency levels.&nbsp;</p><h2>Strong Consistency</h2><p>A service, system or an application is said to be strongly consistent when all the users of the service in different cloud regions globally see the same value of an entity at any point in time.&nbsp;</p><p>Strong consistency is crucial in several use cases like online trading, online booking systems, banking use cases, healthcare, and so on.</p><p>Now, for instance, in our messaging service, if we need to ensure all the users across the globe in a certain channel or a group see consistent message edits or deletions at any point in time for a seamless user experience, we need to make the functionality strongly consistent.&nbsp;</p><p>Without strong consistency, users might experience discrepancies in message content. Some nodes will show the edited message, while others may show the original unedited version, with users ending up having different interpretations of their conversations at a point in time.&nbsp;</p><p>Strong consistency guarantees that all users, regardless of their cloud region or the node they are connected to, see the same version of edited or deleted messages.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1yAm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1yAm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png 424w, https://substackcdn.com/image/fetch/$s_!1yAm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png 848w, https://substackcdn.com/image/fetch/$s_!1yAm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png 1272w, https://substackcdn.com/image/fetch/$s_!1yAm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1yAm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png" width="1456" height="829" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:829,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1949297,&quot;alt&quot;:&quot;Strong consistency&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Strong consistency" title="Strong consistency" srcset="https://substackcdn.com/image/fetch/$s_!1yAm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png 424w, https://substackcdn.com/image/fetch/$s_!1yAm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png 848w, https://substackcdn.com/image/fetch/$s_!1yAm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png 1272w, https://substackcdn.com/image/fetch/$s_!1yAm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c8d9d25-b437-4e1b-821f-f30faecb744c_7735x4405.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This will happen when the system acknowledges or reflects edits only when the edit is successfully replicated to all the distributed nodes of the system across cloud regions.&nbsp;</p><p>Ensuring strong consistency in a system implemented across cloud regions is a little tricky since this involves consistency, latency and availability trade-offs.&nbsp;</p><p>In this scenario, ensuring strong consistency in the system globally means the user edits will only be displayed once they are replicated across all the cloud regions. But this replication will take some time (aka replication lag), which means all the users will be looking at message edits only after a while, not in ultra-low latency, without any delay.&nbsp;</p><p>Also, if there were an entity that multiple users were editing concurrently in real-time as opposed to just one, things would get more tricky. In this scenario, we would have to restrict writes to a single/few cloud regions or only to some availability zones to ensure strong consistency.&nbsp;</p><p></p><h2>Eventual Consistency&nbsp;</h2><p>If the same message editing feature were eventually consistent, then after the edit, users of the same cloud region would immediately see the updated message, while the users of distant cloud regions would see the updated message after the edit is replicated across nodes in those cloud regions after a short while. Meanwhile, they will see the stale value of the message since the database nodes are not locked for reads to ensure strong consistency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-gAH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-gAH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png 424w, https://substackcdn.com/image/fetch/$s_!-gAH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png 848w, https://substackcdn.com/image/fetch/$s_!-gAH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png 1272w, https://substackcdn.com/image/fetch/$s_!-gAH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-gAH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png" width="1456" height="861" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:861,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2029154,&quot;alt&quot;:&quot;Eventual consistency&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Eventual consistency" title="Eventual consistency" srcset="https://substackcdn.com/image/fetch/$s_!-gAH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png 424w, https://substackcdn.com/image/fetch/$s_!-gAH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png 848w, https://substackcdn.com/image/fetch/$s_!-gAH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png 1272w, https://substackcdn.com/image/fetch/$s_!-gAH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36934ce-722b-49bc-83f9-8206cd4c1cfa_7735x4573.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this case, different users in a certain channel or a group will see different versions (original or edited) of a certain message at a point in time. The message content will eventually be consistent globally after replication.&nbsp;</p><p>However, eventual consistency enables the system to be available at the cost of consistency. For instance, if multiple users were updating the same entity concurrently, they could do that without the system needing to lock down nodes to ensure strong consistency or moving the writes to a certain cloud region.&nbsp;</p><p>In the case of strong consistency, we have to lock down the nodes to avoid reads and writes or queue the writes based on the use case when the system is replicating data globally. Users cannot perform writes in real-time until the replication is complete.</p><p>Going through my <a href="https://shivangsnewsletter.com/p/distributed-database">distributed database post</a> is recommended for a deeper understanding.</p><p></p><h2>Causal Consistency&nbsp;</h2><p>Causal consistency preserves the order of updates made to an entity. Imagine we have a certain message in our channel on which we have several replies.&nbsp;</p><p>Preserving the order of replies is essential for the user experience because a certain reply made to that message may lead to more reactionary replies. Mixing the order of replies would leave the users confused.</p><p>This feature needs to be causally consistent. It guarantees that the order of the messages and their replies will be preserved in our messaging channels.</p><p>If you are wondering how do distributed systems maintain the ordering of messages, there are various techniques and approaches, such as leveraging vector clocks, logical clocks, distributed commit algorithms, quorum-based systems, and so on.&nbsp;</p><p>Vector clocks are a common technique to track the ordering of events across the nodes in distributed systems.&nbsp;</p><p></p><h2>Read Your Writes Consistency&nbsp;</h2><p>Read your write consistency ensures if a user has performed a write in the system, they will always see that write when performing a read after the write.</p><p>But why won't a user not see their write immediately after doing it?</p><p>In master-replica DB setup, read replicas reduce the read load on the DB by handling the read requests, whereas all the write operations happen on the master node.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vtzc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vtzc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png 424w, https://substackcdn.com/image/fetch/$s_!Vtzc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png 848w, https://substackcdn.com/image/fetch/$s_!Vtzc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png 1272w, https://substackcdn.com/image/fetch/$s_!Vtzc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vtzc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png" width="1456" height="829" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:829,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:632042,&quot;alt&quot;:&quot;Read your write consistency&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Read your write consistency" title="Read your write consistency" srcset="https://substackcdn.com/image/fetch/$s_!Vtzc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png 424w, https://substackcdn.com/image/fetch/$s_!Vtzc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png 848w, https://substackcdn.com/image/fetch/$s_!Vtzc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png 1272w, https://substackcdn.com/image/fetch/$s_!Vtzc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59334aa7-3d9d-416c-92da-a19f6f73f7ab_5206x2964.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When the user performs the write operation, the write happens on the master node and the updates are replicated to the read replica nodes either synchronously or asynchronously. The time it takes to update the replica from the master node is called the replication lag.&nbsp;</p><p>During this time, if the user's read request hits a certain replica, they may not see their latest write.&nbsp;</p><p>But when the system ensures read-your-write consistency, the user always sees their writes. The system updates the read replicas synchronously when the write operation happens to ensure all the read replicas are strongly consistent with the master node. However, this adds a bit of latency to the system since the write operation is not considered complete unless the read replicas are updated. So, the response latency will have an additional replication lag time added to it.</p><p>When the system is not read-your-write consistent: in our messaging service, when the user edits a message, they may not see their edits immediately after the edit write operation. Or if the user joins a certain channel, they may not see themselves as a member of the channel immediately due to the replication lag. This may make the user perform the write events again or leave them confused, which is not a desired user experience.</p><p></p><h2>Monotonic Reads Consistency&nbsp;</h2><p>Monotonic read consistency ensures that when a user sees a certain value of an object or an entity after a read operation, they always see either the same or newly updated versions of that entity (if available) on subsequent reads. They will never see any earlier value of that entity than what they have already seen.&nbsp;</p><p>Applying this consistency level to our messaging channel use case ensures that the users will always see the same or updated versions of the messages, enjoying a consistent user experience.</p><p>Similarly, if a user reads a message before it is deleted, any subsequent reads by that user will not return the message rather the updated value of that message. The user will consistently see the message as deleted.</p><p></p><h2>Monotonic Writes Consistency&nbsp;</h2><p>Just like monotonic reads, the monotonic writes consistency level always ensures that the writes made by a user are applied in the same order in which they were issued and the user always sees the latest value of an updated entity.</p><p>Applying this consistency level to our messaging channel use case ensures that when a user edits a message multiple times with different values, they always see the updated values of the message every time in the correct order as edited, having a consistent user experience.</p><p>Isn't the monotonic read-write consistency model similar to the causal consistency we discussed before? Both maintain the order of operations, right?</p><h2>Difference Between Monotonic Read-Write and Causal Consistency</h2><p>The difference is subtle. Monotonic read-write consistency preserves simple order of operations for a single process or user. In contrast, causal consistency goes beyond the simple order of operations, ensuring the causally related operations maintain their order across all processes or users in the system. This allows unrelated operations to have a flexible order as long as the causally related operations are observed in the correct order.</p><p>So, for instance, in a monotonic consistent system, if a user sends three messages: A, B &amp; C. The system will ensure the user sees the messages in the same order (A, B, C).&nbsp;</p><p>In a causally consistent system, if a user sends a message A and another user replies to it with message A1, and then the user sends a message B, the system will ensure that all users will observe message A and A1 before message B, maintaining the causal relationship. The causal messages will be given preference in the order.</p><p>Monotonic consistency is typically discussed in the context of events performed by a single user. In contrast, causal consistency is addressed in the context of all the users in the system.&nbsp;</p><p></p><h2>Session Consistency&nbsp;</h2><p>The session consistency model guarantees consistent order of operations performed by a user within a single user session. It does not guarantee the order of operations outside a user session or at a global level.</p><p>So, for instance, if a user edits a message multiple times within a single user session, they would see the edits consistently in the order they were made.&nbsp;</p><p>Another use case for this in our messaging service is chat sessions. Applying session consistency to users' chat sessions ensures the correct order of messages within a specific user's chat session. This enables users to follow a consistent conversation flow.&nbsp;</p><p>The system can assign a unique sequence number or timestamps to the user's messages for consistent ordering within the session. When a user retrieves their chat history, the messages are displayed in the sequence based on the sequence number or timestamps.&nbsp;</p><p>Since the chat involves multiple users, the ordering of messages in session consistency is guaranteed only within respective user sessions. Different users may have different message orders due to network delays or variations in message delivery by the message broker. So, different users may see slightly varied message ordering.&nbsp;</p><p>A real-world complex distributed service may have varying consistency requirements based on the service needs and different use cases. This is where different consistency levels come in handy. We may have scenarios where we may want to provide stronger consistencies within a user session and a bit relaxed consistency at a global level. With different consistency levels, we can optimize our application consistency.&nbsp;</p><p>The choice of consistency levels in a real-world service deployed across different cloud regions is nuanced and often involves considerations specific to each cloud region or availability zone. Different cloud regions may have different latency requirements, network conditions, etc., leading to variations in the consistency guarantees provided.&nbsp;</p><blockquote><p>If you wish to delve into how distributed databases work, how large-scale service manage their database growth, how they deal with global concurrent traffic and distributed data conflicts, including vector clocks, how distributed services are deployed globally and much more, check out the <a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency</a> learning path.&nbsp;</p><p>It's a series of three courses authored by me intended to help you master the fundamentals and the intricacies of designing distributed systems like ESPN, Netflix, YouTube, and more.</p></blockquote><blockquote><p>Additionally, if you wish to learn to code distributed systems from the bare bones, I am running a series on it in this newsletter. <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">Do check it out here</a>.&nbsp;</p><p>If you wish to practice coding distributed systems like Redis, Docker, Git, a DNS server and more from the bare bones in the programming language of your choice, check out&nbsp;<a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>&nbsp;(Affiliate). With their hands-on courses, you not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare your project with the community and then finally navigate the official source code to see how it&#8217;s done.&nbsp;</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase.</p></blockquote><p>If you found the content insightful, do share it with your friends for more reach and consider subscribing to my newsletter if you are reading the <a href="https://shivangsnewsletter.com/">web version of this newsletter post</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/understanding-database-consistency?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/understanding-database-consistency?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>You can get a 50% discount on my courses by sharing my posts with your network. Based on referrals, you can unlock course discounts. Check out the <a href="https://shivangsnewsletter.com/leaderboard">leaderboard page</a>  for details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v7D0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v7D0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!v7D0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!v7D0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!v7D0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v7D0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png" width="982" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4244502a-6143-4907-88fa-abcdac207421_982x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:982,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v7D0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!v7D0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!v7D0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!v7D0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4244502a-6143-4907-88fa-abcdac207421_982x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on&nbsp;<a href="https://shivangsnewsletter.com/chat">Substack chat</a>&nbsp;as well.&nbsp;</p><p>I'll see you in the next post. Until then, Cheers!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[System Design Case Study#3: Distributing Our Database In Different Cloud Regions Globally To Manage Load & Latency]]></title><description><![CDATA[Picture a scenario where we launch an online multiplayer card game based on a regional fictional character.]]></description><link>https://shivangsnewsletter.com/p/distributed-database</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/distributed-database</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Sun, 14 Jan 2024 13:56:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_HHF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Picture a scenario where we launch an online multiplayer card game based on a regional fictional character. The game enables players to trade cards, explore new and unique cards in the system, purchase them, participate in a battle royale mode, and so on.&nbsp;</p><p>Right after the launch, the game starts gaining traction in our country and the neighboring countries. Our service is initially deployed in a specific cloud region, for instance, the Asia Pacific, and is doing well from the latency standpoint.&nbsp;</p><p></p><h2>Our MMO (Massively Multiplayer Online) Gaming Service Architecture</h2><p>Here is the oversimplified architecture of our service:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_HHF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_HHF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png 424w, https://substackcdn.com/image/fetch/$s_!_HHF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png 848w, https://substackcdn.com/image/fetch/$s_!_HHF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png 1272w, https://substackcdn.com/image/fetch/$s_!_HHF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_HHF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png" width="1456" height="861" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:861,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:685899,&quot;alt&quot;:&quot;Single cloud region application architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Single cloud region application architecture" title="Single cloud region application architecture" srcset="https://substackcdn.com/image/fetch/$s_!_HHF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png 424w, https://substackcdn.com/image/fetch/$s_!_HHF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png 848w, https://substackcdn.com/image/fetch/$s_!_HHF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png 1272w, https://substackcdn.com/image/fetch/$s_!_HHF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ba3794b-a642-4a27-ad7e-e18de240b589_6017x3558.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We have a CDN at the edge for delivering static assets to the end users at blazing speed. The application server handles the game logic, player interactions, serving dynamic content, etc. And the database stores the game state and the player profile data.</p><p>There can be other components in our architecture as well, like the cloud object store for storing blob content, a search service for finding cards, an additional game server for matching players of the same skill, etc. But let's just keep things simple for now.&nbsp;</p><p>The aim of this post is to focus on different ways our database can be distributed in different cloud regions to expand our service globally as opposed to deeply delving into the architecture of an MMO game.&nbsp;</p><p>Over time, since our game is the best card game ever, it starts to gain global recognition and blows up beyond our expectations. Concurrent users from different cloud regions flock to our website.&nbsp;</p><p>Tackling this sudden traffic influx needs architectural and infrastructural changes. Our current infrastructure wasn't built to handle the global traffic. Due to excessive traffic, players in the main cloud region, along with those of the other cloud regions, naturally experience higher response latency.</p><p>Requests from different cloud regions have additional network latency when being routed to the main cloud region for read-writes. The requests originating from the main cloud region experience latency due to the infrastructure showing cracks due to exposure to exponential traffic beyond its capacity.&nbsp;</p><p>And this goes without saying: the spike in latency (being a crucial factor in our online game) deteriorates the in-game experience.&nbsp;</p><p>What do we do?</p><p>To deal with this, we distribute our architecture across several cloud regions to cut down the response latency and the load on the main cloud region.&nbsp;</p><p></p><h2>Cross-Cloud Region Distributed Service Architecture&nbsp;</h2><p>We will move the CDNs at the edge locations in the respective cloud regions. The application servers will be deployed in every cloud region across one or multiple availability zones.&nbsp;</p><p>We will have a global load balancer and a regional load balancer. The global load balancer routes traffic across different cloud regions. If a certain cloud region goes down, the global load balancer routes the traffic of that region to the nearest cloud region. The regional load balancers route traffic through different availability zones and data centers.&nbsp;</p><blockquote><p>If you wish to understand how the request flows through the CDN and load balancers to application servers, check out the <a href="https://scaleyourapp.com/system-design-part-1/">CDN and Load balancers (Understanding the request flow)</a> post I've published on my blog.&nbsp;</p></blockquote><p>What about the database? Do we have cloud-region-specific independent database deployments storing data only for that region? Should we shard our database, spreading out shards in different cloud regions? Or should we have read replicas in different cloud regions just for the reads?</p><p>Well, things are not so straightforward here. This largely depends on the business use case.&nbsp;</p><p></p><h2>Distributing Our Database Across Different Cloud Regions&nbsp;</h2><h2>Read Replicas&nbsp;</h2><p>Read replicas are copies of the primary database (in the main cloud region) deployed in different cloud regions.&nbsp;</p><p>With them, the user's read requests from a certain cloud region don't hit the main cloud region database; rather receive responses from their specific cloud region read replica.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8FkP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8FkP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png 424w, https://substackcdn.com/image/fetch/$s_!8FkP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png 848w, https://substackcdn.com/image/fetch/$s_!8FkP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png 1272w, https://substackcdn.com/image/fetch/$s_!8FkP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8FkP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png" width="1456" height="837" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:837,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1442635,&quot;alt&quot;:&quot;Read replicas - Multi cloud region service deployment architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Read replicas - Multi cloud region service deployment architecture" title="Read replicas - Multi cloud region service deployment architecture" srcset="https://substackcdn.com/image/fetch/$s_!8FkP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png 424w, https://substackcdn.com/image/fetch/$s_!8FkP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png 848w, https://substackcdn.com/image/fetch/$s_!8FkP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png 1272w, https://substackcdn.com/image/fetch/$s_!8FkP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f6f481-8399-4a13-92dd-c3bb9df055ea_6559x3769.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This reduces read latency significantly, in addition to balancing database query load globally. Also, read replicas act as backups of the primary database. They function as a safeguard against data loss in case the primary cloud region faces a disaster.</p><p>All the read replicas globally get updated asynchronously with the primary database as it receives the application writes. So, we might account for a bit of lag in the data consistency between the replicas and the primary database.</p><p>The replicas can also be synchronously updated to make the system strongly consistent, but then the users will only be able to perform queries once the replicas are updated.</p><p>Should we use read replicas in our use case? We'll discuss this, but before that, let's understand the other architectural options.</p><p></p><h2>Region-Specific Independent Distributed Databases</h2><p>As opposed to being read-heavy, if most global queries are write-heavy, we would have to deploy cloud-region-specific independent databases to cut down on the write latency and load. This would avert all the write queries to converge towards the primary cloud region database.&nbsp;</p><p>Region-specific database deployments can also be scaled individually based on the regional load, thus adding flexibility to our system architecture.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FqeL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FqeL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png 424w, https://substackcdn.com/image/fetch/$s_!FqeL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png 848w, https://substackcdn.com/image/fetch/$s_!FqeL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png 1272w, https://substackcdn.com/image/fetch/$s_!FqeL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FqeL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png" width="1456" height="837" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:837,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1450600,&quot;alt&quot;:&quot;Independent database deployments multi-cloud region&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Independent database deployments multi-cloud region" title="Independent database deployments multi-cloud region" srcset="https://substackcdn.com/image/fetch/$s_!FqeL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png 424w, https://substackcdn.com/image/fetch/$s_!FqeL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png 848w, https://substackcdn.com/image/fetch/$s_!FqeL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png 1272w, https://substackcdn.com/image/fetch/$s_!FqeL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1a646d-6b64-4fae-bf87-63bddbf1bff2_6559x3769.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read-replicas in write-heavy queries scenario won't do much good since all the write queries would still have to move to the primary cloud region to perform writes.&nbsp;</p><p>There is one more thing: global services often have to comply with local government data laws that typically require them to keep region-specific data within that region. This necessitates us to have region-specific database deployments.&nbsp;</p><p>Region-specific independent database deployments sound good, but they may bring along the additional complexity of synchronizing global writes to some extent at a system level as a whole, if we are maintaining a global game state (which we typically would require).&nbsp;</p><p>For instance, if we are maintaining player rankings at a global level comprising different cloud regions, writes made in respective cloud regions need to be synchronized with the primary database in the main cloud region for some specific user events.</p><p>We have to account for the application data consistency requirements when having such an architecture because global concurrent writes will always be eventually consistent as opposed to being strongly consistent.</p><p></p><h2>Synchronizing Data Across Global Nodes Of A Distributed Database</h2><p>As discussed above, the actual complexity lies in synchronizing writes across several nodes of a distributed database deployed across the globe. This would involve critical factors such as latency requirements, system availability, network issues, data consistency requirements, and so on.&nbsp;</p><p>Different distributed databases and distributed systems, in general, leverage different techniques and strategies to sync data, such as quorum-based data consistency techniques, leveraging different consensus algorithms, such as Raft, Paxos, etc., data versioning mechanisms with vector clocks, and so on.</p><p>This topic deserves a dedicated article; I have added this to my list and will be delving into it in my future posts.&nbsp;</p><blockquote><p>If you wish to delve into the details of how large-scale services handle database growth, how they are deployed in different cloud regions across the globe, the criticality of understanding application data and data access patterns when designing distributed systems, the techniques and intricacies of scaling databases, different data models, distributed transactions, how distributed systems handle data conflicts, cloud infrastructure on which our apps are deployed and much more, check out the <a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency</a> learning path.&nbsp;</p><p>It's a series of three courses authored by me intended to help you master the fundamentals and the intricacies of designing distributed systems like ESPN, Netflix, YouTube, and more.</p></blockquote><p></p><h2>Hybrid Distributed Database Architecture With Read Replicas &amp; Region-Specific Independent Active-Active Deployments</h2><p>Running distributed services is complex. We know that. There can be&nbsp;<em>n</em>&nbsp;number of use cases warranting a hybrid architecture as opposed to sticking with one specific architecture or deployment strategy.</p><p>In a scenario where certain features of our game are unavailable in certain cloud regions due to legal adherence and only read queries originate from that region, just having read replicas could work there. This will reduce the data sync complexities in our architecture to a certain extent.</p><p>Out of those regions, a few could necessitate keeping data locally, requiring us to set up a dedicated regional database. In this scenario, we need to have a hybrid architecture, with a mix of the two optimizing resource usage. </p><p></p><h2>Sharding Our Database Across Different Cloud Regions&nbsp;</h2><p>Upto this point, we discussed two ways to reduce hits on the primary database in the main cloud region. One is having region-specific read replicas and the other is to have independent cloud-region-specific database deployments.&nbsp;</p><p>There is one more way to configure or distribute our database across the globe, which is by sharding it across different cloud regions.&nbsp;</p><p>This is different from independent deployments. Sharding involves breaking down a big database into smaller chunks called shards to reduce the response times and have better management.&nbsp;</p><p>In this setup, the shards are globally distributed across different cloud regions, making the architecture more scalable and flexible. The data based on our use case can be sharded based on geography, user attributes, user involvement in certain game features or some other key.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B4lr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B4lr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png 424w, https://substackcdn.com/image/fetch/$s_!B4lr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png 848w, https://substackcdn.com/image/fetch/$s_!B4lr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png 1272w, https://substackcdn.com/image/fetch/$s_!B4lr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B4lr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png" width="1456" height="837" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:837,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1434357,&quot;alt&quot;:&quot;Database sharding multi-cloud region&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Database sharding multi-cloud region" title="Database sharding multi-cloud region" srcset="https://substackcdn.com/image/fetch/$s_!B4lr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png 424w, https://substackcdn.com/image/fetch/$s_!B4lr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png 848w, https://substackcdn.com/image/fetch/$s_!B4lr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png 1272w, https://substackcdn.com/image/fetch/$s_!B4lr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01592702-0ca9-44b7-ae05-98361c3622d0_6559x3769.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a sharded architecture, a shard in a certain cloud region can receive hits from a different cloud region since the database is sharded as opposed to just holding data of a specific cloud region.&nbsp;</p><p>A shard, as opposed to acting independently only for a certain specific cloud region and containing data only for that region, is a logical part of a bigger database. The player data and game state of a certain cloud region may be stored in the primary cloud region database, depending on the application requirements.</p><p></p><h2>Picking the Right Database Distribution Strategy</h2><p>Which architectural strategy to pick out of three (Read replicas, Cloud-region-specific independent database deployments and Sharding the database across cloud regions) or having a hybrid architecture (a mix of two or all three) largely depends on our business requirements.&nbsp;</p><p>When designing our system, several factors need to be considered, such as traffic patterns, application consistency requirements, latency tolerance, operational costs and overhead, local data compliance, etc.&nbsp;</p><p>If most global traffic is read-heavy and write scenarios are not so latency-sensitive, read replicas could be a good fit.&nbsp;</p><p>To comply with local data laws, we may have to move our data from the main cloud region to respective cloud regions and availability zones. In this scenario, independent cloud-region-specific deployments will come into play.&nbsp;</p><p>If the use case is latency sensitive, we have no option other than to move the data close to the users either via regional sharding or independent deployments.&nbsp;</p><blockquote><p>Speaking of low-latency data access, managed serverless databases are picking up in developer circles of late. They manage the connection pooling, which is a crucial element when handling large concurrent traffic, including the infrastructure. They offer ready-made data access APIs and are a good fit for thick client services. I'll be discussing it in my future posts. Stay tuned.&nbsp;</p></blockquote><p></p><h2>Understanding Trade-offs In System Architecture</h2><p>To understand picking the right database distribution strategy further, let's quickly go over a scenario from our card gaming service, for instance, deployed in four cloud regions. Our game&#8217;s battle royale mode involves players battling in real-time. This will involve super-low-latency responses to user events to keep the engagement and in-game experience high.&nbsp;</p><p>We can set up region-specific databases to ensure low-latencies. But at the same time, we also have to deploy a feature that provides highly or strongly consistent global user rankings in real-time as the players complete in-game events to keep them motivated and playing.&nbsp;</p><p>This would require our system deployed globally to be strongly consistent. This means minimizing the writes to a few cloud regions or availability zones to reduce eventually consistent data as much as possible.</p><p>Now, this is a trade-off between maintaining strong system consistency and ensuring low system latency and availability.&nbsp;</p><p>P.S. We are not considering adherence to local data laws in this scenario.<br><br>We need to spread out writes globally to ensure super-low-latency but at the same time, we have to minimize writes to a few regions or a single region to keep the data strongly consistent. This is a trade-off. Designing this system architecture requires careful thinking.</p><p>So, you see, most architectural decisions are a trade-off. There is no one-size-fits-all. No silver bullet. I want you to tell me in the comments how you would design such a system :)</p><blockquote><p>If you wish to learn to code distributed systems from the bare bones, I am running a series on it in this newsletter. <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">Do check it out here</a>.&nbsp;</p><p>Additionally, if you wish to practice coding distributed systems like Redis, Docker, Git, a DNS server and more from the bare bones in the programming language of your choice, check out&nbsp;<a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>&nbsp;(Affiliate). With their hands-on courses, you not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare your project with the community and then finally navigate the official source code to see how it&#8217;s done.&nbsp;</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase.</p></blockquote><p>If you found the content insightful, do share it with your friends for more reach and consider subscribing to my newsletter if you are reading the <a href="https://shivangsnewsletter.com/">web version of this newsletter post</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p><p><br>You&#8217;ll find the previous system design case study here:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;4cad8586-3b8b-427b-9ca3-ef8a7b334a2b&quot;,&quot;caption&quot;:&quot;Picture a scenario where we need to build a product or a food item image processing feature for our food aggregator app like Swiggy, Zomato or Uber Eats. The feature should enable restaurants to upload images of food items, which are then processed (deduplicated, compressed, and stored) on the backend to be displayed to the end users.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;System Design Case Study #2: Building An Image Processing Pipeline, Weeding Out Duplicates With Content Addressable Storage &amp; How Uber Eats DeDuplicates Their Images&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:26351479,&quot;name&quot;:&quot;Shivang Sarawagi&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/770fbb16-c4a1-4351-9ca6-e1381dff7dc1_1536x2048.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-01-02T09:45:45.575Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://shivangsnewsletter.com/p/image-processing-pipeline&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:140266175,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:5,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Web Scale&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770fbb16-c4a1-4351-9ca6-e1381dff7dc1_1536x2048.jpeg&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>You can get a 50% discount on my courses by sharing my posts with your network. Based on referrals, you can unlock course discounts. Check out the&nbsp;<a href="https://shivangsnewsletter.com/leaderboard">leaderboard page</a>&nbsp;for details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yFqd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yFqd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!yFqd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!yFqd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!yFqd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yFqd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png" width="982" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c374e907-88c5-4d53-beac-b373732466cd_982x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:982,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yFqd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!yFqd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!yFqd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!yFqd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc374e907-88c5-4d53-beac-b373732466cd_982x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/distributed-database?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/distributed-database?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>&nbsp;and can chat with me on <a href="https://shivangsnewsletter.com/chat">Substack chat</a> as well.</p><p>I'll see you in the next post. Until then, Cheers!</p>]]></content:encoded></item><item><title><![CDATA[Designing & Developing Reliable Distributed Services - Observability-Driven Development]]></title><description><![CDATA[Picture running a complex distributed service like a global e-commerce website powered by several microservices on the backend, such as the product catalog service, inventory service, order service, payment service, and so on.]]></description><link>https://shivangsnewsletter.com/p/observability-in-distributed-systems</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/observability-in-distributed-systems</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Sun, 07 Jan 2024 07:06:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G4_k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Picture running a complex distributed service like a global e-commerce website powered by several microservices on the backend, such as the product catalog service, inventory service, order service, payment service, and so on. These individual microservices could be composed of more microservices based on the module requirements. This gives us an idea of our service architecture complexity.&nbsp;&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G4_k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G4_k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png 424w, https://substackcdn.com/image/fetch/$s_!G4_k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png 848w, https://substackcdn.com/image/fetch/$s_!G4_k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png 1272w, https://substackcdn.com/image/fetch/$s_!G4_k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G4_k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png" width="1456" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:945090,&quot;alt&quot;:&quot;Ecommerce service system architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Ecommerce service system architecture" title="Ecommerce service system architecture" srcset="https://substackcdn.com/image/fetch/$s_!G4_k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png 424w, https://substackcdn.com/image/fetch/$s_!G4_k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png 848w, https://substackcdn.com/image/fetch/$s_!G4_k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png 1272w, https://substackcdn.com/image/fetch/$s_!G4_k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1d614-b103-42da-bd01-bfd9d999e641_5825x3514.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At peak times, the orders are being created into the system at a healthy rate, but over time, the customers start to experience delays in processing the payments. A significant percentage of cart payments begin to fail.</p><p>What do we do? How do we pinpoint the bug or the issue in our system? How do we figure out what part or component of our system is facing issues?&nbsp;</p><p>This is where end-to-end system observability saves the day.&nbsp;</p><p>What is observability?</p><p></p><h2>Understanding Observability</h2><p>Observability refers to the degree or level of understanding we have of our distributed system running in production. System observability enables us to analyze the system's behavior and pinpoint and fix the issues the system experiences in production in minimal time. Without it, we would sit in the dark without any idea of what had gone wrong.&nbsp;&nbsp;</p><p>Distributed services such as massive e-commerce sites, movie streaming platforms, social networks, etc., deployed across the world in different cloud regions and availability zones are complex in nature. To ensure their smooth functioning, we need to have real-time production insights.</p><p>System observability helps us ensure system reliability, availability, scalability, and much more in distributed systems, which I am going to discuss in this post.&nbsp;</p><p>We'll begin with telemetry, which is a fundamental component of observability.&nbsp;</p><p></p><h2>What Is Telemetry?</h2><p>Telemetry is an automated process of collecting and transmitting insightful data from different parts of a distributed system to a centralized location for monitoring and analysis. This enables the platform, development, and infrastructure teams to understand what went wrong when the system starts to show unpredictable behavior.</p><p>What is this insightful telemetry data I am talking about?</p><p>Telemetry data involves logs, metrics, traces and other relevant contextual information. Continually sending telemetry data in real-time or at regular intervals from different microservices and other parts of the distributed system to a centralized location for monitoring is a key part of the continuous monitoring process.&nbsp;</p><p>This helps the teams gain insight into the behavior and performance of the system, enabling them to identify and fix infrastructural issues, in addition to performing future capacity planning.&nbsp;&nbsp;</p><p>Let's have a quick insight into logs, metrics and traces.</p><p></p><h2>Telemetry Data (Logs, Metrics, Traces)</h2><h2>Logs</h2><p>When we code an application, along with the main code, we add logs that give us an insight into the code flow during the development phase as well as after the application is deployed in production.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8EmR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8EmR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png 424w, https://substackcdn.com/image/fetch/$s_!8EmR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png 848w, https://substackcdn.com/image/fetch/$s_!8EmR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png 1272w, https://substackcdn.com/image/fetch/$s_!8EmR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8EmR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png" width="1222" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:1222,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:55544,&quot;alt&quot;:&quot;Adding logs in code for observability&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Adding logs in code for observability" title="Adding logs in code for observability" srcset="https://substackcdn.com/image/fetch/$s_!8EmR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png 424w, https://substackcdn.com/image/fetch/$s_!8EmR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png 848w, https://substackcdn.com/image/fetch/$s_!8EmR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png 1272w, https://substackcdn.com/image/fetch/$s_!8EmR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05067fce-c134-4607-b43c-8d3764e5ea87_1222x358.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>During development, logs help us debug the code, and in production, they give us a clear insight into the application flow, helping us understand the system's behavior. This is how we get visibility into the functioning of our software.</p><h2>Metrics&nbsp;</h2><p>Metrics primarily help us gauge the system's performance. Typical metrics that we analyze are response times, rate of specific user events, throughput, CPU, memory, disk utilization, network latency, error rates, system availability metrics (typically in %), and so on.&nbsp;</p><p>So, for instance, if I develop a certain application module or feature and deploy it on the server. The logs will help me understand the code flow. Metrics will help me understand the server resource usage. With it, I can gauge the resource consumption of virtual machines, bare metal servers or the cloud platform our workload is hosted on.</p><p>This helps with the capacity planning, understanding what service features are resource-hungry, and helps ensure our infrastructure has enough capacity to handle the peak traffic.&nbsp;</p><p>Besides the infrastructure resource consumption, metrics help us understand the service response times, throughput, user events, error rates, etc., as well.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SQMe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SQMe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png 424w, https://substackcdn.com/image/fetch/$s_!SQMe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png 848w, https://substackcdn.com/image/fetch/$s_!SQMe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png 1272w, https://substackcdn.com/image/fetch/$s_!SQMe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SQMe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png" width="1456" height="1326" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1326,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Metrics dashboard in distributed systems&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Metrics dashboard in distributed systems" title="Metrics dashboard in distributed systems" srcset="https://substackcdn.com/image/fetch/$s_!SQMe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png 424w, https://substackcdn.com/image/fetch/$s_!SQMe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png 848w, https://substackcdn.com/image/fetch/$s_!SQMe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png 1272w, https://substackcdn.com/image/fetch/$s_!SQMe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73eb7ff0-505f-43ee-98e0-df62621ef2f6_2188x1992.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Img src: <a href="https://grafana.com/docs/grafana/latest/fundamentals/dashboards-overview/">Grafana</a><br><br>The above is the image of a Grafana dashboard displaying production metrics.</p><p>There are no strict rules on what events, insights, or data can be termed as metrics. Whatever helps us understand our system over a period of time can be deemed a helpful metric.&nbsp;</p><h2>Traces</h2><p>Traces provide information on the flow of requests as they travel through different components in a distributed system. So, if the product purchase request goes through the product catalog service, to the inventory service, to the payments service, and so on, the entire flow can be traced via traces. This helps us understand the flow of requests through the system and if there are any system bottlenecks causing throughput issues, etc.&nbsp;</p><p>Traces are crucial in distributed system observability as they provide insights into the end-to-end journey of a request as it travels through different components in the system architecture, such as the load balancers, proxies, caches, API gateways, backend servers, databases and so on. The more observability our system has, the better. There should ideally be no blind spots.&nbsp;</p><p>Summarizing what we learned before: Logs help us understand a specific part or component of our system; metrics help us understand the behavior and resource consumption of specific components as well as the entire system as a whole, and traces provide a higher level of end-to-end visibility into the system.&nbsp;</p><p>Besides the logs, metrics and traces, there is another element that is key to observability: Continuous Profiling.&nbsp;</p><p></p><h2>Continuous Profiling</h2><p>Continuous profiling provides a deeper insight into the production infrastructure in comparison to the telemetry data (logs, metrics, and traces) we discussed above.&nbsp;</p><p>Continuous profiling, which happens in production, is similar to code profiling and microbenchmarking that we developers do in our local systems before pushing our code to the remote repo.&nbsp;</p><p>If you are hazy on code profiling and microbenchmarking, here is the gist:&nbsp;</p><blockquote><p>Code profiling, with the help of specific tools, helps us measure the code performance (it can be specific modules or the entire codebase) to gauge performance bottlenecks, excessive resource usage, memory leaks, and other issues. Code profilers collect data during code execution and provide insights into the code behavior.&nbsp;</p><p>Similarly, microbenchmarking focuses on profiling specific units of code at a very fine-grained level, aiming for high precision. In microbenchmarking, the scope is narrower than code profiling. Specific functions/methods or code snippets are tested in isolation to identify their performance.&nbsp;</p></blockquote><p>Similarly, continuous profiling, in production, provides visibility into data structure memory issues, duration of function calls, memory allocation issues, CPU consumption, disk I/O consumption, etc., at the kernel and userspace levels.</p><p>The process gives us a map of the hot areas in our infrastructure from a resource consumption standpoint at such a deeper level, which is not possible with the above three types of telemetry data, making continuous profiling a critical part of system observability.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E-lt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E-lt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png 424w, https://substackcdn.com/image/fetch/$s_!E-lt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png 848w, https://substackcdn.com/image/fetch/$s_!E-lt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png 1272w, https://substackcdn.com/image/fetch/$s_!E-lt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E-lt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png" width="1456" height="846" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:846,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:361062,&quot;alt&quot;:&quot;Observability&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Observability" title="Observability" srcset="https://substackcdn.com/image/fetch/$s_!E-lt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png 424w, https://substackcdn.com/image/fetch/$s_!E-lt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png 848w, https://substackcdn.com/image/fetch/$s_!E-lt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png 1272w, https://substackcdn.com/image/fetch/$s_!E-lt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd956ccf-0854-4e23-86b5-3bd515512dd3_3841x2233.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Back To Our E-commerce Distributed Service Use Case&nbsp;</h2><p>When customers start experiencing delays in payment processing, we can check the logs to analyze the code flow and identify if any errors or exceptions are reported by the payment microservice or any other microservice that is part of the payment flow.&nbsp;</p><p>We have metrics to gauge issues at the infrastructure level. The resource consumption metrics will provide insight if any of the servers are overloaded and need horizontal scaling. In addition, the rate of order flow, product purchase and other related user events metrics are how, in the first place, we realized that the rate of orders created on the website was dropping.&nbsp;</p><p>We can further leverage the traces to gauge the request flow of the entire product purchase business flow to pinpoint specific system components experiencing the issue. These components can be load balancers, caches, microservices, API gateways, databases, message queues, and so on.</p><p>Finally, we have the continuous profiling data to analyze our system at a much deeper level, pinpointing if any specific code, service or component is hogging resources.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LzAQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LzAQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png 424w, https://substackcdn.com/image/fetch/$s_!LzAQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png 848w, https://substackcdn.com/image/fetch/$s_!LzAQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png 1272w, https://substackcdn.com/image/fetch/$s_!LzAQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LzAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png" width="1456" height="879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:879,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1175779,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LzAQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png 424w, https://substackcdn.com/image/fetch/$s_!LzAQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png 848w, https://substackcdn.com/image/fetch/$s_!LzAQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png 1272w, https://substackcdn.com/image/fetch/$s_!LzAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700a2944-6fc5-4301-873a-6ca1a4c49f3c_5892x3558.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is how system observability is vital in running reliable real-world distributed services. We cannot do without it.&nbsp;</p><p></p><h2>Observability-Driven Development</h2><p>Since observability is key to building modern distributed services, there are quite a number of observability tools/stacks leveraged in the industry that provide end-to-end observability solutions, such as the ELK (Elastic, Logstash, Kibana) stack, Prometheus, Grafana, DataDog, New Relic, Open Telemetry, Google Cloud Profiler, etc. with every tool/observability solution having its use case.</p><p>Now, let's understand how distributed services are built with observability in mind.&nbsp;</p><p></p><h2>How Distributed Services Are Built With Observability In Mind</h2><p>When we write code, we add monitoring code along with the business logic implementation to enable the monitoring tools to understand the code behavior when the application runs in production. Adding logs to our code is one example of it.&nbsp;</p><p>The process of adding observability code along with our main code is known as Instrumentation. When the service runs in production, the telemetry data is streamed to monitoring servers in an automated fashion to enable the devs and support teams analyze the data on dashboards of specific observability solutions. This gives us insights into what is happening in our live service.&nbsp;</p><p>Here is a step-by-step process, from writing code in our local machine to running it in production, of how we can ensure our code is performant as well observable.</p><p>When writing code, we need to ensure relevant logs, error statements, and exceptions are added to help understand the code flow in production. Key events should be logged with appropriate contextual information.</p><p>During the process, we can also micro-benchmark specific lines of code to gauge performance. Once we are done with writing code, it's a good idea to profile it using a code profiler for any performance bottlenecks.&nbsp;</p><p>The code should have excellent test coverage with unit and integration tests. Well, this goes without saying. Tests validate the correctness of code.</p><p>Static code analysis with relevant tools is also done during this phase to analyze the code for memory leaks, adherence of the code to the organization's code style, duplicate sections, vulnerabilities, etc. The whole process is more like an automated code review.&nbsp;&nbsp;</p><p>You may or may not see static code analysis directly related to performance and observability; I brought it up since it's a good development practice.</p><p>As the code is pushed to the remote repo, an automated build test is triggered on the CI (Continuous Integration) server that ideally runs the same checks on the CI server that the dev did in their local system, in addition to running additional scripts.&nbsp;</p><p>After the successful build, the code is deployed on the staging, testing or pre-prod environment based on the organization's practice, where it is stress-tested under simulated traffic. This is where metrics come in handy.&nbsp;</p><p>They give us insights into the system's behavior, bottlenecks, other scalability issues, and more when subjected to heavy traffic, as I discussed at the beginning of the post.&nbsp;</p><p>Once the code is deployed to production, continuous monitoring gets triggered to keep tabs on the infrastructure resource usage and application behavior in real-time.</p><p>Along with keeping an eye on the infrastructure, we analyze errors, exceptions and logs in real-time, leveraging error-tracking tools. Set up alerts and notifications get triggered if the error rates go beyond a certain threshold.</p><p></p><h2>Before Beginning to Write Code: Observability Planning at the System Design Stage</h2><p>Determining what points, metrics, and processes to monitor is crucial before we start coding the service. Observability planning should happen at the designing phase of our distributed service. Planning observability in the design phase enables us to accurately collect system metrics from containers, services, servers, and other specific components of our system architecture.&nbsp;</p><p>We should be aware of where to instrument and add relevant contextual information with the telemetry data before beginning to write code. When we are aware of the observability specifics, we can design our code in a way that enables us to modify and adapt system observability on an ongoing basis without significant code refactoring.&nbsp;</p><p>Folks, I believe this newsletter post gave you a detailed insight into the process of designing and developing performant and observable distributed services.&nbsp;</p><blockquote><p>If you wish to delve deeper into the fundamentals of designing large-scale distributed systems, along with an understanding of the cloud infrastructure on which web-scale services run<em>,&nbsp;</em>check out the<em>&nbsp;</em><a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency learning path</a>&nbsp;authored by me comprising three courses taking you right from zero to giving you a comprehensive insight into the fundamentals of distributed system design.&nbsp;</p></blockquote><blockquote><p>Additionally, if you wish to learn to code distributed systems from the bare bones, I am running a series on it in this newsletter. <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">Do check it out here</a>.&nbsp;</p><p>If you wish to practice coding distributed systems like Redis, Docker, Git, a DNS server and more from the bare bones in the programming language of your choice, check out&nbsp;<a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>&nbsp;(Affiliate). With their hands-on courses, you not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare your project with the community and then finally navigate the official source code to see how it&#8217;s done.&nbsp;</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase.</p></blockquote><p>You can get a 50% discount on my courses by sharing my posts with your network. Based on referrals, you can unlock course discounts. Check out the&nbsp;<a href="https://shivangsnewsletter.com/leaderboard">leaderboard page</a>&nbsp;for details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f4Dl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f4Dl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!f4Dl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!f4Dl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!f4Dl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f4Dl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png" width="982" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:982,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f4Dl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!f4Dl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!f4Dl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!f4Dl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a0047d3-8c20-4f6a-9ac4-46ae39fcc8a5_982x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/observability-in-distributed-systems?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/observability-in-distributed-systems?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><p>If you are reading the&nbsp;<a href="https://shivangsnewsletter.com/">web version of this post</a>, consider subscribing to my newsletter.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>You can find me on&nbsp;<a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a>&nbsp;&amp;&nbsp;<a href="https://twitter.com/shivang_z">X</a>. I'll see you in the next post. Until then, Cheers!</p>]]></content:encoded></item><item><title><![CDATA[System Design Case Study #2: Building An Image Processing Pipeline, Weeding Out Duplicates With Content Addressable Storage & How Uber Eats DeDuplicates Their Images]]></title><description><![CDATA[Picture a scenario where we need to build a product or a food item image processing feature for our food aggregator app like Swiggy, Zomato or Uber Eats.]]></description><link>https://shivangsnewsletter.com/p/image-processing-pipeline</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/image-processing-pipeline</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Tue, 02 Jan 2024 09:45:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nnvb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Picture a scenario where we need to build a product or a food item image processing feature for our food aggregator app like Swiggy, Zomato or Uber Eats. The feature should enable restaurants to upload images of food items, which are then processed (deduplicated, compressed, and stored) on the backend to be displayed to the end users.&nbsp;</p><p>Since our business is really taking off (OMG :)), the image uploads have grown to the scale of millions per hour and the processing of those images has to happen on the backend accordingly. Churning out duplicate images is a key functional requirement to cut down on processing and storage costs.&nbsp;</p><p>How would we build such a system?</p><p>One way to build a system like this is, update the core backend service. Create an API endpoint that facilitates image upload, which then processes the images on the application server and stores the image metadata in a database and the image in cloud storage.</p><p>But we are dealing with a system where the restaurants would upload and update images concurrently in the scale of millions per hour, in real-time. Also, we have specific business rules to process the images. New requirements could be added in the near future.</p><p>For this, we should build a dedicated image-processing pipeline that ingests images in real-time, processes them, and takes further actions like deduplication, compression, storing them in a database, cloud storage, or whatever based on the defined business logic.&nbsp;<br></p><h2>Image Processing Pipeline&nbsp;</h2><p>Why the need for a dedicated image processing pipeline as opposed to coding the feature in the existing core service?</p><p>Because the scale of image uploads and updates is in the millions concurrently in real-time. This feature is both <a href="https://scaleyourapp.com/single-threaded/">CPU &amp; IO-intensive</a>, requiring heavy compute and horizontal scalability. Coupling it with the existing core backend service can make things complex and exert unnecessary load on the servers running the core application modules.</p><p>It's advantageous to setup a dedicated service that would handle image processing in real-time. This will help us scale different services separately based on the requirements. </p><p>To process the images, we can set up an image processing pipeline using a data/stream processing framework like Apache Flink along with a scalable event queue like Kafka.</p><p><strong>Why Flink?</strong></p><p>Apache Flink is a real-time stream-processing framework that can process both bounded and unbounded data streams in real-time at in-memory speed with high throughput at web scale. Flink can horizontally scale across a cluster of machines, enabling it to handle large datasets with high throughput.&nbsp;</p><p>It fits our use case. Flink's ability to scale horizontally enables our system to handle large volumes of image-processing tasks concurrently.</p><p>Moreover, if we were already using Kafka in our architecture, we could consider Kafka streams. If the images were processed in batches as opposed to in real-time, Apache Spark could have been a strong fit.&nbsp;</p><p>If our core workload was hosted on AWS or GCP, we could have considered AWS Kinesis or Google Cloud Dataflow.&nbsp;</p><blockquote><p>Ideally, picking the right streaming framework requires extensive research and a POC (Proof Of Concept) implementation. We need to consider different factors such as the programming models of the stream processing framework, programming language support, data storage support, reliability, state management, horizontal scalability and many more. </p><p>P.S. I've picked Flink just for the design discussion; it's not a definitive selection.&nbsp;</p></blockquote><p>Back to our case study:</p><p>Kafka, in our system architecture, will act as an event queue (a Flink source) that receives the images uploaded by the restaurants and forwards them to Flink.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nnvb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nnvb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png 424w, https://substackcdn.com/image/fetch/$s_!nnvb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png 848w, https://substackcdn.com/image/fetch/$s_!nnvb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png 1272w, https://substackcdn.com/image/fetch/$s_!nnvb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nnvb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png" width="1456" height="721" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:721,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:662235,&quot;alt&quot;:&quot;Image processing pipeline without image staging cloud storage&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image processing pipeline without image staging cloud storage" title="Image processing pipeline without image staging cloud storage" srcset="https://substackcdn.com/image/fetch/$s_!nnvb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png 424w, https://substackcdn.com/image/fetch/$s_!nnvb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png 848w, https://substackcdn.com/image/fetch/$s_!nnvb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png 1272w, https://substackcdn.com/image/fetch/$s_!nnvb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c5e9d1-f1ab-40fa-a39b-e1bf2f3da42c_4193x2077.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why can't we directly stream the images to Flink? What's the need for Kafka?</h2><p>Streaming images to Kafka and storing them there temporarily enables Flink to process images at its own pace. In case the Flink module has any issues and goes down, the users continue to upload images to Kafka uninterrupted.&nbsp;</p><p>Later, Flink can catch up when it bounces back. It can pull the images from Kafka, process them, and store the metadata in the database and the images in the cloud storage. This increases the fault tolerance, reliability and flexibility of the system. Additionally, the ingestion and processing modules of the system are separated, which is an implementation of the separation of concerns principle.&nbsp;<br></p><h2>Additional Image Staging Cloud Storage For Modularity And A More Loosely Coupled Architecture</h2><p>To make our architecture more modular, loosely coupled and scalable, we can separate the image upload, ingestion, processing and storage modules in our image processing pipeline by introducing additional cloud storage in the pipeline for temporarily storing image uploads before the event queue processes them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zHaE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zHaE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png 424w, https://substackcdn.com/image/fetch/$s_!zHaE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png 848w, https://substackcdn.com/image/fetch/$s_!zHaE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png 1272w, https://substackcdn.com/image/fetch/$s_!zHaE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zHaE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png" width="728" height="399" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:798,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:750661,&quot;alt&quot;:&quot;Image processing pipeline&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image processing pipeline" title="Image processing pipeline" srcset="https://substackcdn.com/image/fetch/$s_!zHaE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png 424w, https://substackcdn.com/image/fetch/$s_!zHaE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png 848w, https://substackcdn.com/image/fetch/$s_!zHaE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png 1272w, https://substackcdn.com/image/fetch/$s_!zHaE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b75d34f-322f-440c-85de-8bad0a89dc9c_4286x2348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The additional cloud storage acts as a landing zone for the uploaded images and forms the image upload module. The event queue or the message broker that is Kafka forms the image ingestion module. The data streaming framework Flink is the processing module, while the database that stores the metadata and the cloud object storage that stores the images form the storage module.&nbsp;</p><p>In this image-processing pipeline architecture, all the modules are separated and loosely coupled. Each module can be scaled individually when needed. We can further introduce moving data into a search component to make the product images searchable in the app. But let's focus on the current architecture. I'll have a dedicated post on the search component in scalable system architectures in the near future.&nbsp;</p><p>Once an image or a bunch of images get uploaded to the landing zone, the system will trigger an event for them to be streamed to the event queue, Kafka.</p><p>Though, there is one thing we need to bear in mind. Since our system ingests images in real-time via the API, having an image upload cloud storage that acts as a landing zone for images, might introduce some latency and complexity in the system.&nbsp;</p><p>Using additional cloud storage or not totally depends on the business requirements and our latency budget. If we do not need the additional latency of a dedicated image upload module, we can bypass it by directly pushing the images to the event queue.&nbsp;</p><p>So, this was a high-level discussion on building our image processing pipeline. We can go into more detail, but that's not really required right now.&nbsp;As I publish more system design case studies, I&#8217;ll continue to delve deeper into specific components of scalable system architectures, giving you a holistic view of working with scalable distributed services.</p><p>Now, let's discuss the image deduplication requirement. How do we ensure the images moving to the data storage are unique?</p><p>For instance, <em>n</em> restaurants can sell a can of RedBull, and all of them may upload the same can image provided by the company. In this case, why store the same image <em>n</em> times? Why not just store a single copy of the RedBull with all references pointing to the same image, thus saving storage and compute?<br></p><h2>Content-Addressable Storage: Ensuring the Data Stored Does Not Have Duplicates&nbsp;</h2><p>Content-addressable storage is a technique of storing data where, as opposed to saving it via file name or the file location, it is stored based on the unique hash generated by the content. This helps eliminate duplicate content, reducing storage space and costs significantly.&nbsp;</p><p>I've discussed content-addressable storage before on my blog in a case study: <a href="https://scaleyourapp.com/system-design-github-code-search-engine/">How GitHub indexes code for blazing-fast search and retrieval</a>. You can give it a read.</p><p>In content addressable storage, the hashes are computed from the actual content of data with hash functions such as MD5, SHA-1, SHA-2,<strong>&nbsp;</strong>SHA-256, etc. Since the hashes are unique and change on data updation, this approach provides strong data integrity.&nbsp;</p><p>If the data changes over time, different hashes will be created, enabling us to access different versions of data while maintaining a version history.&nbsp;</p><p>CDNs use this technique to efficiently cache and distribute data in edge regions worldwide. GitHub leveraged this approach to reduce 115TB of data to 28TB of unique content.</p><p>Hash-based content addressable storage is a great way to churn out duplicate data when storing massive volumes of data. And we can leverage this in the processing module of our image processing pipeline to weed out duplicate content, keeping only the original data in the data storage.&nbsp;</p><p>Let's peek into how Uber Eats deduplicates and stores images in their system efficiently using the same technique.<br></p><h2>How Uber Eats Deduplicates And Stores Images In Their System Efficiently</h2><p>Uber Eats handles product image updates in the scale of several hundred million every hour. To keep the image processing, storage and CDN costs down, Uber Eats uses content-addressable storage to detect duplicates in their image processing pipeline.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h-Vq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h-Vq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png 424w, https://substackcdn.com/image/fetch/$s_!h-Vq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png 848w, https://substackcdn.com/image/fetch/$s_!h-Vq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png 1272w, https://substackcdn.com/image/fetch/$s_!h-Vq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h-Vq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png" width="1408" height="176" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:176,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!h-Vq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png 424w, https://substackcdn.com/image/fetch/$s_!h-Vq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png 848w, https://substackcdn.com/image/fetch/$s_!h-Vq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png 1272w, https://substackcdn.com/image/fetch/$s_!h-Vq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c27d3f8-b7ed-4773-bdd5-7c716a1dd8b7_1408x176.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>                                                           Img src: <a href="https://www.uber.com/en-IN/blog/deduping-and-storing-images-at-uber-eats/">Uber</a></p><p>To manage the images, they have three maps (key-value pairs): The original image map, the processed image map and the image URL map.&nbsp;</p><p>The original image map has <em>key </em>as the generated hash of the image and the <em>value </em>as the original image.&nbsp;</p><p>The processed image map contains the <em>key </em>as the hash of the image along with the processing specification and the <em>value </em>is the processed image.&nbsp;</p><p>And the image URL map has the <em>key </em>as the image url and the <em>value </em>as the hash of the image.&nbsp;</p><p>Since the original image map and the processed image map contain the images, they are stored in a blob store called Terrablob, which is similar to AWS S3 (an object storage service). Whereas, since the URL map contains only the metadata it goes into the database called Docstore, which is Uber's multi-model database.&nbsp;</p><p>When an image gets uploaded to the pipeline, the system first refers to the image URL map to check if the image url already exists. If yes, the image hash is read, and then the processed image map is referred to check if the image has been processed. If yes, the system knows the image is duplicate and needs no processing, thus saving compute and storage space.</p><p>If the image is not processed, the system processes it and updates all the required maps. If the url is not present, the system processes the new image and updates all the maps.&nbsp;</p><p>Here is their image processing system architecture flow chart:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GHie!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GHie!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png 424w, https://substackcdn.com/image/fetch/$s_!GHie!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png 848w, https://substackcdn.com/image/fetch/$s_!GHie!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png 1272w, https://substackcdn.com/image/fetch/$s_!GHie!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GHie!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!GHie!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png 424w, https://substackcdn.com/image/fetch/$s_!GHie!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png 848w, https://substackcdn.com/image/fetch/$s_!GHie!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png 1272w, https://substackcdn.com/image/fetch/$s_!GHie!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f46ffd7-0ece-4a8c-83d9-cec553530bb8_1600x547.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>                                                            Img src: <a href="https://www.uber.com/en-IN/blog/deduping-and-storing-images-at-uber-eats/">Uber</a></p><p>There are a bunch of possible combinations when an image is uploaded to their system. Also, they are keeping both the original uploaded image and the processed image in their storage. <a href="https://www.uber.com/en-IN/blog/deduping-and-storing-images-at-uber-eats/">You can refer to this article for details</a>. But you got the idea of how we can build a data streaming system that would churn out duplicates for system efficiency and reduced storage costs.</p><blockquote><p>If you want to learn to design large-scale distributed systems from the bare bones, along with the discussion on the fundamental concepts starting right from zero, check out the&nbsp;<a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency learning path</a>.&nbsp;</p><p>It comprises three courses I have authored intending to educate you, step by step, on the domain of software architecture, cloud infrastructure and distributed system design.</p></blockquote><blockquote><p>Additionally, if you wish to learn to code distributed systems from the bare bones, I am running a series on it in this newsletter. <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">Do check it out here</a>.&nbsp;</p><p>If you wish to practice coding distributed systems like Redis, Docker, Git, a DNS server and more from the bare bones in the programming language of your choice, check out&nbsp;<a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>&nbsp;(Affiliate). With their hands-on courses, you not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare your project with the community and then finally navigate the official source code to see how it&#8217;s done.&nbsp;</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase.</p></blockquote><p>Here is the previous case study if you haven&#8217;t read it yet:</p><blockquote><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;28787c2f-c8d4-41fd-9b43-541ebb9764c2&quot;,&quot;caption&quot;:&quot;Picture a scenario where several microservices in our distributed system architecture leverage cache for performance, to reduce database hits and to lower the operational costs. Every microservice team uses caching libraries, for instance, Redis, Caffeine, etc., directly in their code without a standardized interface. This tight coupling of external tec&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;System Design Case Study #1: Implementing Caching In A Distributed Architecture. How DoorDash Did It In Their Microservices Architecture &quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:26351479,&quot;name&quot;:&quot;Shivang Sarawagi&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/770fbb16-c4a1-4351-9ca6-e1381dff7dc1_1536x2048.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-12-27T07:14:49.272Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://shivangsnewsletter.com/p/caching-in-microservices-architecture&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:140094725,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Web Scale&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770fbb16-c4a1-4351-9ca6-e1381dff7dc1_1536x2048.jpeg&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Implementing Caching In A Distributed Architecture. How DoorDash Did It In Their Microservices Architecture</p></blockquote><p>You can get a 50% discount on my courses by sharing my posts with your network. Based on referrals, you can unlock course discounts. Check out the <a href="https://shivangsnewsletter.com/leaderboard">leaderboard page</a> for details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SwCy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SwCy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!SwCy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!SwCy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!SwCy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SwCy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png" width="982" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6805f16-6a85-468b-9134-ae3e87940746_982x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:982,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SwCy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png 424w, https://substackcdn.com/image/fetch/$s_!SwCy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png 848w, https://substackcdn.com/image/fetch/$s_!SwCy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png 1272w, https://substackcdn.com/image/fetch/$s_!SwCy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6805f16-6a85-468b-9134-ae3e87940746_982x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/image-processing-pipeline?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/image-processing-pipeline?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><p>If you reading the <a href="https://shivangsnewsletter.com/">web version of this post</a>, consider subscribing to my newsletter.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Web Scale! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>You can find me on <a href="https://www.linkedin.com/in/shivang-sarawagi-b7b5881b/">LinkedIn</a> &amp; <a href="https://twitter.com/shivang_z">X</a>. </p><p>I'll see you in the next post. Until then, Cheers!</p>]]></content:encoded></item><item><title><![CDATA[System Design Case Study #1: Implementing Caching In A Distributed Architecture. How DoorDash Did It In Their Microservices Architecture ]]></title><description><![CDATA[Picture a scenario where several microservices in our distributed system architecture leverage cache for performance, to reduce database hits and to lower the operational costs.]]></description><link>https://shivangsnewsletter.com/p/caching-in-microservices-architecture</link><guid isPermaLink="false">https://shivangsnewsletter.com/p/caching-in-microservices-architecture</guid><dc:creator><![CDATA[Shivang Sarawagi]]></dc:creator><pubDate>Wed, 27 Dec 2023 07:14:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7TG2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Picture a scenario where several microservices in our distributed system architecture leverage cache for performance, to reduce database hits and to lower the operational costs.&nbsp;</p><p>Every microservice team uses caching libraries, for instance, Redis, Caffeine, etc., directly in their code without a standardized interface. This tight coupling of external tech with the local code not only makes the overall system code messy but also provides minimal overall control and observability of cache implementation in our system.&nbsp;</p><h4>But why do we need control and observability over the cache implementation in our system architecture?</h4><p>It's critical that the cache stays in sync with the original data source. If the data goes stale, based on the use case, it can break the business logic. Different business use cases have different staleness tolerance. Fixing issues arising out of data staleness can be time-consuming and complex.</p><p>If we have control over the cache implementation in real-time, we can turn the cache on and off in our system without any code redeployment. If the cache needs tuning or any adjustment, we can do that as well across our system architecture.</p><p>In addition, we can route a traffic percentage to the original data source and compare the results with the cache consistently to ensure the cached data is always consistent with the data source. This technique is called cache shadowing.</p><p>We can further study the cache hit rates, error percentages, and such for further observability. Keeping tabs on this data helps in coming up with an effective cache invalidation strategy.</p><p>Having control over the cache implementation in our distributed architecture helps us bail out on the existing implementation and plug in a different caching tech altogether when needed.&nbsp;</p><p>This is primarily possible if we use a standardized caching interface in our code as opposed to tightly coupling the cache directly.&nbsp;</p><h4><br>Caching In Microservices Architecture At DoorDash</h4><p>DoorDash faced a similar issue when using Caffeine as a local cache and Redis as a distributed cache in their microservices architecture.&nbsp;</p><p>Most microservices teams at DoorDash directly plugged these libraries into their code, which made things messy. To tackle this, DoorDash developed a single caching interface and a multi-layered caching system for cache implementation across their system architecture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://lh3.googleusercontent.com/8x3mI0XC1i41tpnubmKzfYttVFz1W3D5b7SsgN41aF4Yw0RIfEnIolgX7fYozWmFA-LHdoIvqxOJH9-U8V9BJBvKvDTZc7CzpYlpVgO_zHE4UO8eUpzztfZLWthgDpev4XRH784PZG7inkqwTUbKN78" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7TG2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png 424w, https://substackcdn.com/image/fetch/$s_!7TG2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png 848w, https://substackcdn.com/image/fetch/$s_!7TG2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png 1272w, https://substackcdn.com/image/fetch/$s_!7TG2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7TG2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png" width="1456" height="855" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:855,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:511575,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://lh3.googleusercontent.com/8x3mI0XC1i41tpnubmKzfYttVFz1W3D5b7SsgN41aF4Yw0RIfEnIolgX7fYozWmFA-LHdoIvqxOJH9-U8V9BJBvKvDTZc7CzpYlpVgO_zHE4UO8eUpzztfZLWthgDpev4XRH784PZG7inkqwTUbKN78&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7TG2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png 424w, https://substackcdn.com/image/fetch/$s_!7TG2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png 848w, https://substackcdn.com/image/fetch/$s_!7TG2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png 1272w, https://substackcdn.com/image/fetch/$s_!7TG2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05abfa13-b9cd-4b66-9e98-f08db0a78d6e_3816x2242.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The first layer is the request local cache powered by a simple HashMap that contains data only for the lifetime of the request. The request cache is the cached response for the initial request.</p><p>The second layer (Caffeine implementation) is scoped across the JVM, where the cache data is visible to all workers in a single JVM. <a href="https://doordash.engineering/2023/10/19/how-doordash-standardized-and-improved-microservices-caching/">The DoorDash article</a> doesn&#8217;t specify what workers mean here explicitly. I reckon it&#8217;s the worker threads running in a JVM, not the application instances.&nbsp;</p><p>The third layer contains the cached data for all the pods interacting within the same Redis cluster. A pod contains one or more containers running on a single node in a Kubernetes cluster.</p><p>This multi-layered caching system could be integrated transparently with every microservice with minimal disruption. In this system, the cache request progresses through the layers until the value to the key is found. If the value is retrieved from a later layer, it is stored in the earlier layer for faster access to subsequent requests.&nbsp;</p><p>To measure cache performance, the cache hit and miss ratio is recorded. To ensure data freshness, the cache shadowing mechanism is implemented that for a percentage of cache reads invokes the fallback layer of a certain cache layer and compares cached and fallback values for equality.&nbsp;</p><p>This is done on an ongoing basis and metrics on successful and unsuccessful matches are graphed and alerted. This data is key for building an effective cache invalidation strategy.</p><p>In the multi-layered caching system, based on the requirements, individual caches can be turned off by setting their TTL to zero. For use cases that could tolerate a degree of cache staleness, all the caching layers are leveraged and for cases where data consistency is super important, a few or all layers are turned off to enable the requests to directly hit the database or the single source of truth.</p><p></p><h4>Key System Design And Backend Engineering Lessons From This Case Study:</h4><p>1. Always implement an abstraction layer when integrating third-party code with your code. Tightly coupling third-party code with our code isn't a great idea. It makes the code messy and also prevents us from bailing out on a technology when required.</p><p>Here is a quick code example for this:</p><p>Let's say we intend to integrate multiple caching libraries like Redis and Caffeine into our code. We will create a CacheService interface as opposed to directly using the Redis or Caffeine code in our classes.&nbsp;</p><p>The CacheService serves as an abstraction that averts the need for significant code refactoring if we need to switch to a different caching tech.&nbsp;</p><pre><code>public interface CacheService {    
   String getValue(String key)    
   void setValue(String key, String value);
}</code></pre><p>We will have separate classes for every cache library implementation.&nbsp;</p><pre><code><code>@Service</code>        
<code>public class CaffeineCacheService implements CacheService { &nbsp;&nbsp;
    private final CaffeineCache&lt;String, String&gt; cache;&nbsp; 
&nbsp;
    public CaffeineCacheService() {        
    this.cache = //code for initializing Caffeine
} </code>&nbsp; &nbsp;&nbsp;

<code>@Override    
public String getValue(String key) {&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;
   //get value from the Caffeine cache &amp; return
}

@Override
public void setValue(String key, String value) {&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;
   //set value in the Caffeine cache&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;
 }&nbsp; &nbsp;&nbsp;
}</code></code></pre><pre><code><code>@Service
public class RedisCacheService implements CacheService {&nbsp; &nbsp;&nbsp;
    private final RedisCache&lt;String, String&gt; cache;
    
    public RedisCacheService(RedisCache&lt;String, String&gt; cache) {&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;
    this.cache = //initialize Redis cache;&nbsp; &nbsp;&nbsp;
}

</code>@Override
public String getValue(String key) {
    //get value from Redis cache and return
}

@Override
public void setValue(String key, String value) {
   //set value in Redis cache
 }
}</code></pre><p>Now, if I want to use these caches in my DataService class, the class will interact with the CacheService interface as opposed to directly interacting with the Redis and Caffeine code.&nbsp;</p><pre><code>@Service
public class DataService {
&nbsp; &nbsp;&nbsp;private final CacheService caffeineCacheService;
&nbsp; &nbsp;&nbsp;private final CacheService redisCacheService;

@Autowired
public DataService(CacheService caffeineCacheService, CacheService redisCacheService) {
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;this.caffeineCacheService = caffeineCacheService;
 &nbsp; &nbsp; &nbsp;&nbsp; this.redisCacheService = redisCacheService;
}

public String getCaffeineCachedData(String key) {
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;return caffeineCacheService.getValue(key);
}

public String getRedisCachedData(String key) {
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;return redisCacheService.getValue(key);
 }
}</code></pre><p>2. The ability to configure a tech implementation with minimal rollbacks and redeployment, including infrastructure observability, are super important for the reliability, availability and scalability of our system architecture. We need to keep this in mind when designing and implementing our architecture.&nbsp;</p><pre><code>If you wish to take a deep dive into the fundamentals of designing a large-scale service, check out the <a href="https://learnsoftwarearchitecture.com/">Zero to Software Architecture Proficiency learning path</a> comprising three courses I have authored intending to educate you, step by step, on the domain of software architecture, cloud infrastructure and distributed system design.

This learning path offers you a structured learning experience, taking you right from having no knowledge on the domain to making you a pro in designing web-scale distributed systems like YouTube, Netflix, ESPN and the like.&nbsp;<a href="https://learnsoftwarearchitecture.com/">Check it out</a>.</code></pre><blockquote><p>Additionally, if you wish to learn to code distributed systems from the bare bones, I am running a series on it in this newsletter. <a href="https://shivangsnewsletter.com/p/distributed-programming-part-1">Do check it out here</a>.&nbsp;</p><p>If you wish to practice coding distributed systems like Redis, Docker, Git, a DNS server and more from the bare bones in the programming language of your choice, check out&nbsp;<a href="https://codecrafters.io/?via=techPackets">CodeCrafters</a>&nbsp;(Affiliate). With their hands-on courses, you not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare your project with the community and then finally navigate the official source code to see how it&#8217;s done.&nbsp;</p><p>You can use&nbsp;<a href="https://app.codecrafters.io/join?via=techPackets">my unique link to get 40% off</a>&nbsp;if you decide to make a purchase.</p></blockquote><p>If you found the content insightful, do share it with your network for more reach and consider subscribing to my newsletter. </p><p>You can read the previous system design case studies on my blog:</p><pre><code><a href="https://scaleyourapp.com/system-design-case-study-real-time-messaging-architecture/">Exploring Slack&#8217;s Real-time Messaging Architecture</a> 

<a href="https://scaleyourapp.com/how-discord-scaled-their-member-update-feature/">How Discord Scaled Their Member Update Feature Benchmarking Different Data Structures</a> 

<a href="https://scaleyourapp.com/system-design-github-code-search-engine/">How GitHub Indexes Code For Blazing Fast Search &amp; Retrieval</a>

<a href="https://scaleyourapp.com/svelte-at-stack-overflow/">Why Stack Overflow Picked Svelte for their Overflow AI Feature And the Website UI</a>

<a href="https://scaleyourapp.com/scaling-a-stateful-service">How WalkMe Engineering Scaled their Stateful Service Leveraging Pub-Sub Mechanism</a>

<a href="https://scaleyourapp.com/in-memory/">In-Memory Storage &amp; In-Memory Databases &#8211; Storing Application Data In-Memory To Achieve Sub-Second Response Latency</a></code></pre><p>I&#8217;ll see you in the next post. Until then, Cheers!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/p/caching-in-microservices-architecture?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/p/caching-in-microservices-architecture?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://shivangsnewsletter.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://shivangsnewsletter.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>