lander: added post

2023-10-14 21:40:45 +02:00 · 2023-10-14 21:40:45 +02:00 · 0897a275ee
parent ac6b2cab5b
commit 0897a275ee
3 changed files with 149 additions and 5 deletions
--- a/config.toml
+++ b/config.toml
@ -8,7 +8,7 @@ pygmentsUseClasses = true

 [params]
  description = "The Rusty Bever"
-  copyright = "Copyright © 2022 Jef Roosens"
+  copyright = "Copyright © 2023 Jef Roosens"
  dark = "auto"
  highlight = true

--- a/content/links/index.md
+++ b/content/links/index.md
@ -4,8 +4,8 @@ title: "Links"

 ### Vieter

-Vieter is an implementation of an Arch repository server written in V, combined
-with a build system.
+An implementation of an Arch repository server combined with a build system,
+written in V.

 * [Source](https://git.rustybever.be/vieter-v/vieter)
 * [Docs](/docs/vieter)
@ -13,8 +13,8 @@ with a build system.

 ### Alex

-Alex is a Rust program that wraps a Minecraft server process and automates
-creating incremental backups.
+Minecraft server process wrapper that automates creating (incremental) backups,
+written in Rust.

 * [Source](https://git.rustybever.be/Chewing_Bever/alex)

@ -25,3 +25,9 @@ that I've designed to update the hosted files using POST requests from my CI.

 * [Backend Source](https://git.rustybever.be/Chewing_Bever/site-backend)
 * [Blog Source](https://git.rustybever.be/Chewing_Bever/site)
+
+## Lander
+
+My home-grown URL shortener & pastebin, written from the ground up in C.
+
+* [Source](https://git.rustybever.be/Chewing_Bever/lander)
--- a/content/posts/lander/index.md
+++ b/content/posts/lander/index.md
@ -0,0 +1,138 @@
+---
+title: "Designing my own URL shortener"
+date: 2023-10-14
+---
+
+One of the projects I've always found to be a good choice for a side project is
+a URL shortener. The core idea is simple and fairly easily to implement, yet it
+allows for a lot of creativity in how you implement it. Once you're done with
+the core idea, you can start expanding the project as you wish: expiring links,
+password protection, or perhaps a management API. The possibilities are
+endless!
+
+Naturally, this post talks about my own version of a URL shortener:
+[Lander](https://git.rustybever.be/Chewing_Bever/lander). In order to add some
+extra challenge to the project, I've chosen to write it from the ground up in C
+by implementing my own event loop, and building an HTTP server on top to use as
+the base for the URL shortener.
+
+## The event loop
+
+Lander consists of three layers: the event loop, the HTTP loop and finally the
+Lander-specific code. Each of these layers utilizes the layer below it, with
+the event loop being the bottom-most layer. This layer directly interacts with
+the networking stack and ensures bytes are received from and written to the
+client. The book [Build Your Own Redis](https://build-your-own.org/redis/) by
+James Smith was an excellent starting point, and I highly recommend checking it
+out! This book taught me everything I needed to know to start this project.
+
+Now for a slightly more techical dive into the inner workings of the event
+loop. The event loop is the layer that listens on the listening TCP socket for
+incoming connections and directly processes requests. In each iteration of the
+event loop, the following steps are taken:
+
+1. For each of the open connections:
+    1. Perform network I/O
+    2. Execute data processing code, provided by the upper layers
+    3. Close finished connections
+2. Accept a new connection if needed
+
+The event loop runs on a single thread, and constantly goes through this cycle
+to process requests. Here, the "data processing code" is a set of function
+pointers passed to the event loop that get executed at specific times. This is
+how the HTTP loop is able to inject its functionality into the event loop.
+
+In the event loop, a connection can be in one of three states: `request`,
+`response`, or `end`. In `request` mode, the event loop tries to read incoming
+data from the client into the read buffer. This read buffer is then used by the
+data processing code's data handler. In `response` mode, the data processing
+code's data writer is called, which populates the write buffer. This buffer is
+then written to the connection socket. Finally, the `end` state simply tells
+the event loop that the connection should be closed without any further
+processing. A connection can switch between `request` and `response` mode as
+many times as needed, allowing connections to be reused for multiple requests
+from the same client.
+
+The event loop provides all the necessary building blocks needed to build a
+client-server type application. These are then used to implement the next
+layer: the HTTP loop.
+
+## The HTTP loop
+
+Before we can design a specific HTTP-based application, we need a base to build
+on. This base is the HTTP loop. It handles both serializing and deserializing
+of HTTP requests & responses, along with providing commonly used functionality,
+such as bearer authentication and reading & writing files to & from disk. The
+request parser is provided by the excellent
+[picohttpparser](https://github.com/h2o/picohttpparser) library. The parsed
+request is stored in the request's data struct, providing access to this data
+for all necessary functions.
+
+The HTTP loop defines a request handler function which is passed to the event
+loop as the data handler. This function first tries to parse the request,
+before routing it accordingly. For routing, literal string matches or
+RegEx-based routing is available.
+
+Each route consists of one or more steps. Each of these steps is a function
+that tries to advance the processing of the current request. The return value
+of these steps tells the HTTP loop whether the step has finished its task or if
+it's still waiting for I/O. The latter instructs the HTTP loop to skip this
+request for now, delaying its processing until the next cycle of the HTTP loop.
+In each cycle of the HTTP loop (or rather, the event loop), a request will try
+to advance its processing by as much as possible by executing as many steps as
+possible, in order. This means that very small requests can be completely
+processed within a single cycle of the HTTP loop. Common functionality is
+provided as predefined steps. One example is the `http_loop_step_body_to_buf`
+step, which reads the request body into a buffer.
+
+The HTTP loop also provides the data writer functionality, which will stream an
+HTTP response to the write buffer. The contents of the response are tracked in
+the request's data struct, and these data structs are recycled between requests
+using the same connection, preventing unnecessary allocations.
+
+## Lander
+
+Above the HTTP loop layer, we finally reach the code specific to Lander. It
+might not surprise you that this layer is the smallest of the three, as the
+abstractions below allow it to focus on the task at hand: serving and storing
+HTTP redirects (and pastes). The way these are stored however is, in my
+opinion, rather interesting.
+
+For our Algorithms & Datastructures 3 course, we had to design three different
+trie implementations in C: a Patricia trie, a ternary trie and a "custom" trie,
+where we were allowed to experiment with different ideas. For those unfamiliar,
+a trie is a tree-like datastructure used for storing strings. The keys used in
+this tree are the strings themselves, with each character causing the tree to
+branch off. Each string is stored at depth `m`, with `m` being the length of
+the string. This also means that the search depth of a string is not bounded by
+the size of the trie, but rather the size of the string! This allows for
+extremely fast lookup times for short keys, even if we have a large number of
+entries.
+
+My design ended up being a combination of both a Patricia and a ternary trie: a
+ternary trie that supports skips the way a Patricia trie does. I ended up
+taking this final design and modifying it for this project by optimising it (or
+at least try to) for shorter keys. This trie structure is stored completely in
+memory, allowing for very low response times for redirects. Pastes are served
+from disk, but their lookup is also performed using the same in-memory trie.
+
+## What's next?
+
+Hopefully the above explanation provides some insight into the inner workings
+of Lander. For those interested, the source code is of course available
+[here](https://git.rustybever.be/Chewing_Bever/lander). I'm not quite done with
+this project though.
+
+My current vision is to have Lander be my personal URL shortener, pastebin &
+file-sharing service. Considering a pastebin is basically a file-sharing
+service for text files specifically, I'd like to combine these into a single
+concept. The goal is to rework the storage system to support arbitrarily large
+files, and to allow storing generic metadata for each entry. The initial
+usecase for this metadata would be storing the content type for uploaded files,
+allowing this header to be correctly served when retrieving the files. This
+combined with supporting large files turns Lander into a WeTransfer
+alternative! Besides this, password protection and expiration of pastes is on
+my to-do list as well. The data structure currently doesn't support removing
+elements either, so this would need to be added in order to support expiration.
+
+Hopefully a follow-up post announcing these changes will come soon ;)