Cloudflare enables its customers to run serverless code at the edge globally at blazing speeds with almost zero cold startup time.
They achieve this with their V8-powered deployment architecture in contrast to going forward with the traditional containers and Kubernetes approach.
The prime reason behind not using containers or VMs is achieving sub-millisecond serverless latency and supporting a significantly large number of tenants who could run their workloads independently without sharing memory and state at the edge.
We are aware that serverless instances spin up to handle requests and spin down when idle to save costs. This is a trade-off between latency and running costs. It takes from 500 ms to 10 seconds to spin up a container or a VM to process a request, resulting in an unpredictable code execution time.
Cloudflare's V8 isolate architecture, in contrast, warms up a function in under 5 milliseconds or less. However, this approach has trade-offs like any other design decision, which I'll discuss in the later part of this post.
Let's first delve into the V8 isolate design.
Cloudflare V8 isolate architecture
The V8 isolate architecture leverages the V8 engine (a high-performance JavaScript and Web Assembly engine originally developed by Google for Chrome) to run isolates, which are a lightweight sandboxed environment running individual workloads.
A single V8 engine instance can run multiple isolates running independent workloads with strict code isolation. Code from one isolate cannot access the memory or data of another.
However, all these isolates share the same OS process, which makes them extremely lightweight and fast.
An isolate has its own mechanism to ensure safe memory access, which makes it possible to run untrusted code from many different customers/tenants within a single V8 engine instance run by an operating system process.
Every isolate executes code in an event-driven model suiting IO-intensive tasks.
Cold start latency with traditional containerized serverless deployments
In a traditional serverless architecture, every request to a function instance triggers a container process, which includes an OS layer, application dependencies and the function code.
Initializing a container has a time delay, aka the cold startup latency, which could range from hundreds of milliseconds to multiple seconds.
Also, since a function instance ideally runs in its own container process with process-level isolation, running a large number of tenant workloads serving millions of concurrent requests becomes starkly resource-intensive.
In cloud environments, isolation is ideally achieved at the VM or the container-level process. Each workload runs as a separate container or a VM process with its own dedicated resources, reducing the risk of data leaks and unauthorized access.
Furthermore, if a certain process crashes, it doesn't affect the other workloads running on the same machine.
V8 isolates
In contrast to containerized deployments, with isolates, running in V8 engine runtime, the startup time of an isolate on average is sub-milliseconds, since this does not involve boot up of any containers and OS processes.
The V8 runtime is already running and it spawns an isolate in sub-milliseconds when required to process a request.
In this architecture, multiple isolates can run within the same V8 engine, requiring minimal resources, enabling the platform to host and run code for a relatively large number of tenants at the edge, allowing instantaneous scaling.
Also, since a single OS process can run hundreds or thousands of isolates, this averts the inter-process context switching costs associated with container-based deployments.
This is crucial for Cloudflare as it runs thousands of tenant workloads at the edge on every machine and needs to rapidly switch between them thousands of times per second with minimal overhead.
If these workloads were running as separate containerized processes, the ability of the infrastructure to support a considerable number of customers would be greatly reduced. This is the key requirement behind coming up with a resource-optimized and scalable deployment approach like V8 isolates.
Understanding the user space and kernel space
Operating systems are split into user space and kernel space to separate application tasks from the core system tasks. Keeping control over how applications interact with the underlying OS and hardware resources is essential for security and performance.
The kernel space runs the core OS operations with direct and unrestricted access to the underlying hardware resources. These operations involve handling IO, memory, process and thread management, etc.
All the user applications, like browsers, code editors, etc., run in the user space and interact with the underlying hardware through the kernel space. The application code that runs in the user space is sandboxed and has to request access to the system resources and other kernel services through system calls hitting the kernel space.
The separation between the user space and the kernel space is a fundamental design principle in operating systems, facilitating secure, fault-tolerant, and efficient execution of operations.
V8 engine, isolates, and the underlying OS
The V8 engine runs in the user space and interacts with the OS through the kernel space via system calls. It provides a sandboxed environment to run JavaScript and WebAssembly code.
When isolates process the requests in an event loop asynchronous IO fashion, the business logic runs in the user space, while the IO tasks rely on the OS kernel to handle the underlying network and other system interactions.
Each V8 engine instance runs in a user-space process with a process ID and with memory boundaries enforced by the OS. This provides a secure sandboxed V8 execution environment.
The memory allocated to a V8 instance is distributed across the isolates by the V8 engine. As the number of isolates increases, a V8 engine instance may request more memory from the OS.
V8 isolates provide strict tenant-level isolation. Though they technically reside in the same OS-allocated memory space inside a V8 engine instance, the V8 architecture ensures no data is shared between these isolates by allocating independent logical memory for each isolate, enforcing strict isolate-level sandboxing.
Each isolate’s data (objects, functions, closures, etc.) is stored in a distinct memory region managed by the V8 engine. Furthermore, each isolate has its separate garbage collection process as well, upholding strict isolation.
Limitations of the V8 isolate architecture
The isolate architecture is a well-engineered solution for deploying latency-sensitive multi-tenant workloads. However, it only supports JavaScript and languages like Go or Rust, whose code can be compiled to WebAssembly bytecode to run on a platform supporting WebAssembly.
Other cloud serverless offerings leveraging containerized and VM-based deployments support a range of programming languages based on workload requirements.
Furthermore, containers and VMs provide process-level isolation often preferred by enterprise workloads with strict security and compliance standards, primarily in the finance, government and healthcare space, which is not possible with isolates.
Also, isolates aren't designed for compute-intensive and long-running tasks. They have strict memory and compute limits set by the platform, suiting short IO-intensive lightweight operations running at the edge.
A discussion post on IO-bound, CPU-bound, and memory-bound operations is in the pipeline, including the subsequent posts in the 'Coding a message broker from the bare bones' series.
I'll also discuss the Firecracker MicroVMs by AWS, which are lightweight virtual machines providing startup times close to isolates while retaining the flexibility of containers. Stay tuned.
If you haven't subscribed yet, do subscribe to my newsletter to have my posts slide into your inbox as soon as they are published.
Resources to upskill on cloud computing and implementing distributed systems
To learn the fundamentals of cloud computing, including concepts like FaaS, containers, VMs, deployment infrastructure and more, do check out my cloud course. It's a part of the zero to system architecture bundle, which takes you from zero to mastering the fundamentals of system architecture.
Furthermore, to code distributed systems like Git, Redis, Kafka, and more from the bare bones in the programming language of your choice, check out CodeCrafters.
With their interactive, hands-on exercises, you'll get a good understanding of how these systems work. If you decide to make a purchase, you can use my unique link to get 40% off (affiliate).
If you wish to delve further into V8 isolate architecture, do go through this resource.
Recommended reads on systems programming
I've implemented a single-threaded and multithreaded TCP/IP server in Java. You can go through the linked posts to understand how servers function.
I am writing a crash course on building a distributed message broker like Kafka from the bare bones. You can read the first post here.
Also, check out my post on Serverless Compute & Storage At the Edge With Stateless & Stateful Functions. It’s an insightful read.
If you found this post insightful, consider sharing it with your friends for more reach. You can find me on LinkedIn & X and can chat with me on Substack chat as well. I'll see you in the next post.
Until then, Bye!
I'm wondering why they're not using unikernels. Unikernels have single digit millisecond startups and are not limited to isolates.
Using stateful scale-to-zero you can preserve the JIT compilation cache of v8.
Thanks, you made my week, the article has many in-depth details.