lander: added post
ci/woodpecker/push/woodpecker Pipeline was successful
Details
ci/woodpecker/push/woodpecker Pipeline was successful
Details
parent
ac6b2cab5b
commit
0897a275ee
|
@ -8,7 +8,7 @@ pygmentsUseClasses = true
|
|||
|
||||
[params]
|
||||
description = "The Rusty Bever"
|
||||
copyright = "Copyright © 2022 Jef Roosens"
|
||||
copyright = "Copyright © 2023 Jef Roosens"
|
||||
dark = "auto"
|
||||
highlight = true
|
||||
|
||||
|
|
|
@ -4,8 +4,8 @@ title: "Links"
|
|||
|
||||
### Vieter
|
||||
|
||||
Vieter is an implementation of an Arch repository server written in V, combined
|
||||
with a build system.
|
||||
An implementation of an Arch repository server combined with a build system,
|
||||
written in V.
|
||||
|
||||
* [Source](https://git.rustybever.be/vieter-v/vieter)
|
||||
* [Docs](/docs/vieter)
|
||||
|
@ -13,8 +13,8 @@ with a build system.
|
|||
|
||||
### Alex
|
||||
|
||||
Alex is a Rust program that wraps a Minecraft server process and automates
|
||||
creating incremental backups.
|
||||
Minecraft server process wrapper that automates creating (incremental) backups,
|
||||
written in Rust.
|
||||
|
||||
* [Source](https://git.rustybever.be/Chewing_Bever/alex)
|
||||
|
||||
|
@ -25,3 +25,9 @@ that I've designed to update the hosted files using POST requests from my CI.
|
|||
|
||||
* [Backend Source](https://git.rustybever.be/Chewing_Bever/site-backend)
|
||||
* [Blog Source](https://git.rustybever.be/Chewing_Bever/site)
|
||||
|
||||
## Lander
|
||||
|
||||
My home-grown URL shortener & pastebin, written from the ground up in C.
|
||||
|
||||
* [Source](https://git.rustybever.be/Chewing_Bever/lander)
|
||||
|
|
|
@ -0,0 +1,138 @@
|
|||
---
|
||||
title: "Designing my own URL shortener"
|
||||
date: 2023-10-14
|
||||
---
|
||||
|
||||
One of the projects I've always found to be a good choice for a side project is
|
||||
a URL shortener. The core idea is simple and fairly easily to implement, yet it
|
||||
allows for a lot of creativity in how you implement it. Once you're done with
|
||||
the core idea, you can start expanding the project as you wish: expiring links,
|
||||
password protection, or perhaps a management API. The possibilities are
|
||||
endless!
|
||||
|
||||
Naturally, this post talks about my own version of a URL shortener:
|
||||
[Lander](https://git.rustybever.be/Chewing_Bever/lander). In order to add some
|
||||
extra challenge to the project, I've chosen to write it from the ground up in C
|
||||
by implementing my own event loop, and building an HTTP server on top to use as
|
||||
the base for the URL shortener.
|
||||
|
||||
## The event loop
|
||||
|
||||
Lander consists of three layers: the event loop, the HTTP loop and finally the
|
||||
Lander-specific code. Each of these layers utilizes the layer below it, with
|
||||
the event loop being the bottom-most layer. This layer directly interacts with
|
||||
the networking stack and ensures bytes are received from and written to the
|
||||
client. The book [Build Your Own Redis](https://build-your-own.org/redis/) by
|
||||
James Smith was an excellent starting point, and I highly recommend checking it
|
||||
out! This book taught me everything I needed to know to start this project.
|
||||
|
||||
Now for a slightly more techical dive into the inner workings of the event
|
||||
loop. The event loop is the layer that listens on the listening TCP socket for
|
||||
incoming connections and directly processes requests. In each iteration of the
|
||||
event loop, the following steps are taken:
|
||||
|
||||
1. For each of the open connections:
|
||||
1. Perform network I/O
|
||||
2. Execute data processing code, provided by the upper layers
|
||||
3. Close finished connections
|
||||
2. Accept a new connection if needed
|
||||
|
||||
The event loop runs on a single thread, and constantly goes through this cycle
|
||||
to process requests. Here, the "data processing code" is a set of function
|
||||
pointers passed to the event loop that get executed at specific times. This is
|
||||
how the HTTP loop is able to inject its functionality into the event loop.
|
||||
|
||||
In the event loop, a connection can be in one of three states: `request`,
|
||||
`response`, or `end`. In `request` mode, the event loop tries to read incoming
|
||||
data from the client into the read buffer. This read buffer is then used by the
|
||||
data processing code's data handler. In `response` mode, the data processing
|
||||
code's data writer is called, which populates the write buffer. This buffer is
|
||||
then written to the connection socket. Finally, the `end` state simply tells
|
||||
the event loop that the connection should be closed without any further
|
||||
processing. A connection can switch between `request` and `response` mode as
|
||||
many times as needed, allowing connections to be reused for multiple requests
|
||||
from the same client.
|
||||
|
||||
The event loop provides all the necessary building blocks needed to build a
|
||||
client-server type application. These are then used to implement the next
|
||||
layer: the HTTP loop.
|
||||
|
||||
## The HTTP loop
|
||||
|
||||
Before we can design a specific HTTP-based application, we need a base to build
|
||||
on. This base is the HTTP loop. It handles both serializing and deserializing
|
||||
of HTTP requests & responses, along with providing commonly used functionality,
|
||||
such as bearer authentication and reading & writing files to & from disk. The
|
||||
request parser is provided by the excellent
|
||||
[picohttpparser](https://github.com/h2o/picohttpparser) library. The parsed
|
||||
request is stored in the request's data struct, providing access to this data
|
||||
for all necessary functions.
|
||||
|
||||
The HTTP loop defines a request handler function which is passed to the event
|
||||
loop as the data handler. This function first tries to parse the request,
|
||||
before routing it accordingly. For routing, literal string matches or
|
||||
RegEx-based routing is available.
|
||||
|
||||
Each route consists of one or more steps. Each of these steps is a function
|
||||
that tries to advance the processing of the current request. The return value
|
||||
of these steps tells the HTTP loop whether the step has finished its task or if
|
||||
it's still waiting for I/O. The latter instructs the HTTP loop to skip this
|
||||
request for now, delaying its processing until the next cycle of the HTTP loop.
|
||||
In each cycle of the HTTP loop (or rather, the event loop), a request will try
|
||||
to advance its processing by as much as possible by executing as many steps as
|
||||
possible, in order. This means that very small requests can be completely
|
||||
processed within a single cycle of the HTTP loop. Common functionality is
|
||||
provided as predefined steps. One example is the `http_loop_step_body_to_buf`
|
||||
step, which reads the request body into a buffer.
|
||||
|
||||
The HTTP loop also provides the data writer functionality, which will stream an
|
||||
HTTP response to the write buffer. The contents of the response are tracked in
|
||||
the request's data struct, and these data structs are recycled between requests
|
||||
using the same connection, preventing unnecessary allocations.
|
||||
|
||||
## Lander
|
||||
|
||||
Above the HTTP loop layer, we finally reach the code specific to Lander. It
|
||||
might not surprise you that this layer is the smallest of the three, as the
|
||||
abstractions below allow it to focus on the task at hand: serving and storing
|
||||
HTTP redirects (and pastes). The way these are stored however is, in my
|
||||
opinion, rather interesting.
|
||||
|
||||
For our Algorithms & Datastructures 3 course, we had to design three different
|
||||
trie implementations in C: a Patricia trie, a ternary trie and a "custom" trie,
|
||||
where we were allowed to experiment with different ideas. For those unfamiliar,
|
||||
a trie is a tree-like datastructure used for storing strings. The keys used in
|
||||
this tree are the strings themselves, with each character causing the tree to
|
||||
branch off. Each string is stored at depth `m`, with `m` being the length of
|
||||
the string. This also means that the search depth of a string is not bounded by
|
||||
the size of the trie, but rather the size of the string! This allows for
|
||||
extremely fast lookup times for short keys, even if we have a large number of
|
||||
entries.
|
||||
|
||||
My design ended up being a combination of both a Patricia and a ternary trie: a
|
||||
ternary trie that supports skips the way a Patricia trie does. I ended up
|
||||
taking this final design and modifying it for this project by optimising it (or
|
||||
at least try to) for shorter keys. This trie structure is stored completely in
|
||||
memory, allowing for very low response times for redirects. Pastes are served
|
||||
from disk, but their lookup is also performed using the same in-memory trie.
|
||||
|
||||
## What's next?
|
||||
|
||||
Hopefully the above explanation provides some insight into the inner workings
|
||||
of Lander. For those interested, the source code is of course available
|
||||
[here](https://git.rustybever.be/Chewing_Bever/lander). I'm not quite done with
|
||||
this project though.
|
||||
|
||||
My current vision is to have Lander be my personal URL shortener, pastebin &
|
||||
file-sharing service. Considering a pastebin is basically a file-sharing
|
||||
service for text files specifically, I'd like to combine these into a single
|
||||
concept. The goal is to rework the storage system to support arbitrarily large
|
||||
files, and to allow storing generic metadata for each entry. The initial
|
||||
usecase for this metadata would be storing the content type for uploaded files,
|
||||
allowing this header to be correctly served when retrieving the files. This
|
||||
combined with supporting large files turns Lander into a WeTransfer
|
||||
alternative! Besides this, password protection and expiration of pastes is on
|
||||
my to-do list as well. The data structure currently doesn't support removing
|
||||
elements either, so this would need to be added in order to support expiration.
|
||||
|
||||
Hopefully a follow-up post announcing these changes will come soon ;)
|
Loading…
Reference in New Issue