Coding projects for software developers - Part 2 - Build components and plumb them together
This is part two of my coding projects for developers post series. Do check out part one, which is the introductory part containing eight well-researched industry-relevant coding projects for you.
I am also running a Discord server where you can share your coding progress with us in addition to having project development discussions with the community.
With that being said, let's get started.
Project 9: Code a text editor from scratch (This is an interesting project; read to know why)
Tags: Data structures, String algorithms
Level: All
Build your own text editor is an instruction booklet that helps you build your own text editor in C programming language.
It's about 1000 lines of C in a single file with no dependencies and implements all the basic features you would expect in a minimal text editor, including syntax highlighting and search.
Here is a reimplementation of the same project in Rust.
Here is the GitHub repo of another text editor written in Lua and C. You'll find the project doc here.
Key learnings from this project
When you code a text editor from scratch, it will put your data structures and algorithm knowledge (string algorithms primarily) to the test.
Common data structures and algorithms that are used to implement a text editor
Some of the primary data structures and algorithms that are used to implement a text editor app are:
Gap buffer
The gap buffer data structure, which is a dynamic array, is used for efficient insertions and deletions at the cursor position. It maintains a gap, which is a sequence of unused elements within the array; the gap is moved to the cursor position when the user performs an insertion or deletion operation.
The average time complexity of inserting elements at the current cursor position with this data structure is O(1), which makes it fitting for applications that need to modify data with low latency.
Rope
A rope data structure is a binary tree that is leveraged to efficiently store and manipulate large strings. Text operations that require frequent insertions, deletions and concatenations of substrings can be performed more efficiently with minimal delay with ropes compared to traditional array-based strings.
Piece table
The piece table DS maintains two buffers: one for the original text and another for the modifications. It keeps track of what parts of the text need to be displayed and in what order without modifying the original text.
The data structure efficiently manages the undo and redo operations by keeping a history of changes. It also reduces the need for frequent memory reallocations. Only the modified parts of text are stored in the add buffer, which is more memory-efficient than copying the entire text for each change.
Stack
Stack can be used to implement the undo and redo operations as well, as it follows the Last In, First Out (LIFO) principle, making it suitable for maintaining the history of operations for undo/redo functionality.
However, it can become inefficient if the history of actions is extensive or if complex actions need to be reversed. In this scenario, the piece table data structure is more efficient in maintaining and applying a history of changes when handling large texts with a significant number of modifications.
Linked list
With linked lists, we can have each line of text as a node, allowing efficient insertion and deletion of lines. Cursor movements can be handled by traversing the list.
If we need more fine-grained control we can have each node store a character enabling insertions and deletions at any position within the text. Linked lists allow for O(1) insertions and deletions, which makes them fit for a text editor where the content changes frequently.
Furthermore, this data structure does not have a predefined size, enabling the text editor to handle arbitrarily large texts efficiently.
Trie
The trie data structure stores strings as a series of linked nodes and facilitates fast lookup for prefixes. Autocomplete and spell-check features are implemented using tries.
Suffix tree
Suffix trie facilitates efficient search for substrings and pattern matching within the text. This structure contains all the suffixes of a given text as their keys and the positions in the text as their values.
String algorithm
Several string-based search algorithms are leveraged to manage text in editors, such as Knuth-Morris-Pratt (KMP), Boyer-Moore, Rabin-Karp algorithm, etc.
KMP algorithm is efficient for searching substrings and pattern matching in large texts. Boyer-Moore does a similar thing by leveraging heuristics.
Greedy algorithm and dynamic programming can be leveraged for implementing line breaks at appropriate positions in the editor to fit within a given width.
Related read: String search algorithms
To further delve into the details of text editing, check out this book, 'The Craft of Text Editing'.
If you implement this project, you'll have good hands-on practice of your data structures and algorithms. In addition, you'll learn many backend engineering concepts that I'll discuss in the subsequent project.
Also, you may not have to implement all the standard text editor features on the first go. You can pick the important ones and code them.
Once you have the text editor ready, let's put it online as a real-time concurrent text editing service like Google Docs.
Project 10: Building a real-time collaboration text editor like Google Docs
Tags: System design, Web, Full stack
Level: Intermediate, Advanced
Regarding building a web service like Google Docs, here are a few resources that'll help you do that:
Freecodecamp post on building a Google Docs clone with React, Firebase and Material UI.
This LogRocket article builds the same with plain JavaScript and Firebase on the backend.
And this video tutorial does the same with React, Socket.io, and MongoDB.
The above resources will help you implement your web service. You may or may not use the same tech for your project. Pick the tech you are most comfortable with and fits the use case.
Besides coding the web service, you would need to know a few system design concepts like CRDTs and operational transformation to understand how conflicted data (when multiple users are modifying the same content) is dealt with in real-time collaboration services.
CRDT (Conflict-free Replicated Data Types)
A conflict-free replicated data type is a data structure that is replicated across multiple nodes in a distributed system and enables concurrent updates to a resource without coordinating with other nodes in real-time. The concurrent updates are merged eventually.
A conflict resolution algorithm that is a part of this data type automatically resolves the inconsistencies that might occur. However, saying this doesn't make resolving concurrent conflicts any easier. While implementing CRDTs, we need to design conflict-resolving logic and test the scenarios thoroughly.
An overly simple example of CRDT is having a boolean counter to track the occurrence of an event in a distributed system. If that event occurs on any of the nodes in the system, the flag is set to true. When the writes are merged eventually (with a node having the flag as true and others as false), the final resolved result is true: the event has occurred.
As the name implies (conflict-free replicated data types), different data types are leveraged in CRDTs like the stack, queue, list, counter, etc., equipped with different techniques to resolve conflicts efficiently.
Redis CRDTs
Redis uses CRDTs to automatically handle conflicting concurrent reads and writes across multiple Redis nodes spread across the globe. CRDTs are implemented using a global database spanning multiple clusters.
These databases are called Conflict-free Replicated Databases or CRDBs.
CRDBs are local databases in every cluster, behaving just like a regular Redis database, establishing replication between each other, facilitated by a process called CRDB Syncer.
CRDBs resolve conflicts in concurrent write operations based on defined rules per data type. The CRDB Syncer communicates the writes to nodes in a streaming fashion, compressing the data in motion. The sync process is intelligent enough to handle the interruptions due to network partitions and other reasons.
If you are intrigued and want to read more on how Redis handles conflicting writes based on the data type, what are the rules involved per data type? Check out this and this resource.
Related read: Riot Games leverages CRDTs to implement the in-game chat system of their game League of Legends.
Operational Transformation
Operational transformation is a technology or a technique primarily used in resolving conflicts in collaborative systems, such as online text editors like Google Docs, where multiple users work on the same content concurrently.
How Google Docs leverages operational transformation to resolve concurrent data conflicts
Every user has their own local copy of data to work on in a lock-free, non-blocking manner. The changes are asynchronously propagated across the system delivering all the users a consistent view. To achieve a consistent view, the data is transformed before being displayed to the other users.
Here is how:
Imagine two users, A and B, working on the same text concurrently. Consider a string "abc". User A adds a character z at position 0. For him, the string will now be "zabc"; the same is updated on the server.
The string for user B is still "abc" for now and she deletes the character c at position 2. On the server, the revised string after deleting the character at position 2 will be "zac" which is incorrect since the deleted character was c, not b. The correct string should be "zab"
To avoid these inconsistencies, the changes before being replicated across are transformed operationally or, in simple words, passed through a function with specified rules to achieve the correct effect.
So, the change made by user B before being replicated will be passed through an OT function for the correct effect. Once passed, the string "zab" will be updated on the screens of all the users working on it concurrently.
Several operational transformation algorithms have been designed with different capabilities and consistency models for different applications.
MongoDB Operational Transformation
MongoDB Atlas app service leverages operational transformation to resolve conflicts with predefined conflict resolution rules.
Taking the example from the documentation:
Matt and Sarah are working on data for their dog-walking business. Matt deletes data on one of their client's dogs, Doug, as they no longer need to walk him. While Sarah is out without an internet connection, she edits Doug's required walk time data on her local, offline version, as she does not know about Matt's deletion of Doug's data.
Once Sarah regains the internet connection, her change is sent to the server. As deletes always win according to App Services' conflict resolution rules, Matt's deletion is kept by the server as opposed to Sarah's edit. The server will not send Sarah's edits to Matt's device. After the transformation, the data is again in agreement across Matt and Sarah's devices.
Here are a few good reads on CRDTs for further exploration:
Designing data structures for collaborative apps
A simple approach to building a real-time collaborative text editor
Yjs is a high-performance CRDT for building collaborative applications that sync automatically. Many services use it to implement a real-time collaboration feature. Do check out the docs for the implementation insights.
Furthermore, run a Google search for, 'How to build a real-time collaborative app using CRDTs' you'll get a bunch of tutorials.
A case study on building a real-time collaborative text editor for the browser from scratch in JavaScript
Conclave is an open-source, real-time, collaborative text editor for the browser built from scratch in JavaScript.
It uses CRDTs to make the users stay in sync and WebRTC to enable users to send messages directly to one another over a decentralized peer-to-peer network.
Here is a detailed case study on how they built it, the backend architecture, etc. It's a pretty insightful read.
Plumbing everything together
If you code the above two projects clubbing the components together, it will provide you with a solid hands-on learning experience. The resources that I have shared have implementation in varied technologies, in addition to having discussions on system design concepts as well.
The right way to implement your project would be to go through the resources, understand the concepts and implement your code from scratch in any programming language and technology stack of your choice. If you understand the fundamentals and the associated concepts, implementing the project won't be a problem for you.
Moreover, if you have any issues with the implementation, we have our Discord server where you can discuss things with us.
Also, this is a recommended read: How to wrap our heads around large codebases and open-source GitHub repositories
Furthermore, if you wish to master web architecture and system design starting from zero, check out my Zero to Software Architecture Proficiency learning path, comprising three courses that go through all the associated concepts in an easy-to-understand language. The courses educate you, step by step, on the domain of web architecture, cloud infrastructure and distributed system design.
I am also implementing a distributed system from the bare bones. Here is part one of the series of newsletter posts that I'll be publishing on that.
Additionally, if you want to start your journey of implementing a distributed system from the bare bones, check out CodeCrafters (Affiliate). It is a platform that helps us code systems like Redis, Docker, Git, a DNS server, and more step-by-step from the bare bones in the programming language of our choice.
It's designed to help developers learn how to build complex systems from the ground up, focusing on teaching the internals of distributed systems and other related technologies through hands-on, project-based learning.
Each project is broken down into stages, with each stage focusing on implementing a specific feature or component of the system. Do check it out and kickstart your hands-on systems programming learning. If you decide to make a purchase, you can use my unique link to get 40% off.
More projects are coming soon to this list. If you wish to be notified of my future posts, including the new project additions to this list, do subscribe to this newsletter (if you haven’t yet).
Also, if you found this post insightful, please do share this web URL with your network more reach.
You can find me on LinkedIn & X and can chat with me on Substack chat as well. I'll see you in the next post. Until then, Cheers!