fix rss feeds; move stuff around

This commit is contained in:
Jef Roosens 2024-06-06 09:54:26 +02:00
parent cd10df9d32
commit 323b9e2e3c
Signed by: Jef Roosens
GPG key ID: B75D4F293C7052DB
9 changed files with 16 additions and 60 deletions

View file

@ -1,142 +0,0 @@
---
title: "Automating Minecraft Server Backups"
date: 2023-09-07
---
I started playing Minecraft back in 2012, after the release of version 1.2.5.
Like many gen Z'ers, I grew up playing the game day in day out, and now 11
years later, I love the game more than ever. One of the main reasons I still
play the game is multiplayer, seeing the world evolve as the weeks go by with
everyone adding their own personal touches.
Naturally, as a nerd, I've grown the habit of hosting my own servers, as well
as maintaining instances for friends. Having managed these servers, I've
experienced the same problems that I've heard other people complaining about as
well: backing up the server.
{{< figure src="./the-village.jpg" title="Sneak peak of the village we live in" >}}
## The Problem
Like any piece of software, a Minecraft server instance writes files to disk,
and these files, a combination of world data and configuration files, are what
we wish to back up. The problem is that the server instance is constantly
writing new data to disk. This conflicts with the "just copy the files"
approach (e.g. `tar` or `rsync`), as these will often encounter errors because
they're trying to read a file that's actively being written to. Because the
server isn't aware it's being backed up, it's also possible it writes to a file
already read by the backup software while the other files are still being
processed. This produces an inconsistent backup with data files that do not
properly belong together.
There are two straightforward ways to solve this problem. One would be to
simply turn off the server before each backup. While this could definitely work
without too much interruption, granted the backups are scheduled at times no
players are online, I don't find this to be very elegant.
The second solution is much more appealing. A Minecraft server can be
controlled using certain console commands, with the relevant ones here being
`save-off`, `save-all`, and `save-on`. `save-off` tells the server to stop
saving its data to disk, and cache it in memory instead. `save-all` flushes the
server's data to disk, and `save-on` enables writing to disk again. Combining
these commands provides us with a way to back up a live Minecraft server: turn
off saving using `save-off`, flush its data using `save-all`, back up the
files, and turn on saving again using `save-on`. With these tools at my
disposal, I started work on my own custom solution.
## My solution
After some brainstorming, I ended up with a fairly simple approach: spawn the
server process as a child process with the parent controlling the server's
stdin. By taking control of the stdin, we can send commands to the server
process as if we'd typed them into the terminal ourselves. I wrote the original
proof-of-concept over two years ago during the pandemic, but this ended up
sitting in a dead repository afterwards. However, a couple of months ago, some
new motivation to work on the project popped into my head (I started caring a
lot about our world), so I turned it into a fully fletched backup tool! The
project's called [alex](https://git.rustybever.be/Chewing_Bever/alex) and as
usual, it's open-source and available on my personal Gitea instance.
Although Alex is a lot more advanced now than it was a couple of months back,
it still functions on the same principle of injecting the above commands into
the server process's stdin. The real star of the show however is the way it
handles its backups, which brings us into the next section.
## Incremental backups
You could probably describe my usual projects as overengineered, and Alex is no
different. Originally, Alex simply created a full tarball every `n` minutes
(powered by the lovely [tar-rs](https://github.com/alexcrichton/tar-rs)
library). While this definitely worked, it was *slow*. Compressing several
gigabytes of world files always takes some time, and this combined with shaky
hard drive speeds resulted in backups easily taking 5-10 minutes. Normally,
this wouldn't bother me too much, but with this solution, the Minecraft server
isn't writing to disk for the entire duration of this backup! If the server
crashed during this time, all this data would be lost.
This called for a better method: incremental backups. For those unfamiliar, an
incremental backup is a backup that only stores the changes that occurred since
the last backup. This not only saves a ton of disk space, but it also greatly
decreases the amount of data that needs to be compressed, speeding up the
backup process tremendously.
Along with this, I introduced the concept of "chains". Because an incremental
backup describes the changes that occurred since the last backup, it needs that
other backup in order to be fully restored. This also implies that the first
incremental backup needs to be based off a full backup. A chain defines a list
of sequential backups that all depend on the one before them, with each chain
starting with a full backup.
All of this combined resulted in the following configuration for backups: the
admin can configure one or more backup schedules, with each schedule being
defined by a name, a frequency, a chain length and how many chains to keep. For
each of these configurations, a new backup will be created periodically
according to the defined frequency, and this backup will be appended to the
current chain for that schedule. If the chain is full (as defined by the chain
length), a new chain is created. Finally, the admin can configure how many of
these full chains to keep.
As an example, my server currently uses a dual-schedule system:
* One configuration is called "30min". As the name suggests, it has a frequency
of 30 minutes. It stores chains of length 48, and keeps 1 full chain. This
configuration allows me to create incremental backups (which take 5-10
seconds) every 30 minutes, and I can restore these backups in this 30-minute
granularity up to 24 hours back.
* The second configuration is called "daily", and this one simply creates a
full backup (a chain length of 1) every 24 hours, with 7 chains being stored.
This allows me to roll back a backup with a 24-hour granularity up to 7 days
back.
This configuration would've never been possible without incremental backups, as
the 30 minute backups would've simply taken too long otherwise. The required
disk space would've also been rather unwieldy, as I'd rather not store 48
multi-gigabyte backups per day. With the incremental backups system, each
backup after the initial full backup is only a few megabytes!
Of course, a tool like this wouldn't be complete without some management
utilities, so the Alex binary contains tools for restoring backups, exporting
incremental backups as a new full backup, and unpacking a backup.
## What's next?
There's still some improvements I'd like to add to Alex itself, notably making
Alex more aware of the server's internal state by parsing its logs, and making
restoring backups possible without having to stop the Alex instance (this is
rather cumbersome in Docker containers).
On a bigger scale however, there's another possible route to take: add a
central server component where an Alex instance can publish its backups to.
This server would then have a user management system to allow certain users of
the Minecraft server to have access to the backups for offline use. This server
could perhaps also show the logs of the server instance, as well as handling
syncing the backups to another location, such as an S3 store. This would make
the entire system more resistant to data loss.
Of course, I'm well aware these ideas are rather ambitious, but I'm excited to
see where this project might go next!
That being said, Alex is available as statically compiled binaries for `amd64`
and `arm64` [on my Gitea](https://git.rustybever.be/Chewing_Bever/alex). If
you're interested in following the project, Gitea recently added repository
[RSS feeds](https://git.rustybever.be/Chewing_Bever/alex.rss) ;)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 593 KiB

View file

@ -1,138 +0,0 @@
---
title: "Designing my own URL shortener"
date: 2023-10-14
---
One of the projects I've always found to be a good choice for a side project is
a URL shortener. The core idea is simple and fairly easily to implement, yet it
allows for a lot of creativity in how you implement it. Once you're done with
the core idea, you can start expanding the project as you wish: expiring links,
password protection, or perhaps a management API. The possibilities are
endless!
Naturally, this post talks about my own version of a URL shortener:
[Lander](https://git.rustybever.be/Chewing_Bever/lander). In order to add some
extra challenge to the project, I've chosen to write it from the ground up in C
by implementing my own event loop, and building an HTTP server on top to use as
the base for the URL shortener.
## The event loop
Lander consists of three layers: the event loop, the HTTP loop and finally the
Lander-specific code. Each of these layers utilizes the layer below it, with
the event loop being the bottom-most layer. This layer directly interacts with
the networking stack and ensures bytes are received from and written to the
client. The book [Build Your Own Redis](https://build-your-own.org/redis/) by
James Smith was an excellent starting point, and I highly recommend checking it
out! This book taught me everything I needed to know to start this project.
Now for a slightly more techical dive into the inner workings of the event
loop. The event loop is the layer that listens on the listening TCP socket for
incoming connections and directly processes requests. In each iteration of the
event loop, the following steps are taken:
1. For each of the open connections:
1. Perform network I/O
2. Execute data processing code, provided by the upper layers
3. Close finished connections
2. Accept a new connection if needed
The event loop runs on a single thread, and constantly goes through this cycle
to process requests. Here, the "data processing code" is a set of function
pointers passed to the event loop that get executed at specific times. This is
how the HTTP loop is able to inject its functionality into the event loop.
In the event loop, a connection can be in one of three states: `request`,
`response`, or `end`. In `request` mode, the event loop tries to read incoming
data from the client into the read buffer. This read buffer is then used by the
data processing code's data handler. In `response` mode, the data processing
code's data writer is called, which populates the write buffer. This buffer is
then written to the connection socket. Finally, the `end` state simply tells
the event loop that the connection should be closed without any further
processing. A connection can switch between `request` and `response` mode as
many times as needed, allowing connections to be reused for multiple requests
from the same client.
The event loop provides all the necessary building blocks needed to build a
client-server type application. These are then used to implement the next
layer: the HTTP loop.
## The HTTP loop
Before we can design a specific HTTP-based application, we need a base to build
on. This base is the HTTP loop. It handles both serializing and deserializing
of HTTP requests & responses, along with providing commonly used functionality,
such as bearer authentication and reading & writing files to & from disk. The
request parser is provided by the excellent
[picohttpparser](https://github.com/h2o/picohttpparser) library. The parsed
request is stored in the request's data struct, providing access to this data
for all necessary functions.
The HTTP loop defines a request handler function which is passed to the event
loop as the data handler. This function first tries to parse the request,
before routing it accordingly. For routing, literal string matches or
RegEx-based routing is available.
Each route consists of one or more steps. Each of these steps is a function
that tries to advance the processing of the current request. The return value
of these steps tells the HTTP loop whether the step has finished its task or if
it's still waiting for I/O. The latter instructs the HTTP loop to skip this
request for now, delaying its processing until the next cycle of the HTTP loop.
In each cycle of the HTTP loop (or rather, the event loop), a request will try
to advance its processing by as much as possible by executing as many steps as
possible, in order. This means that very small requests can be completely
processed within a single cycle of the HTTP loop. Common functionality is
provided as predefined steps. One example is the `http_loop_step_body_to_buf`
step, which reads the request body into a buffer.
The HTTP loop also provides the data writer functionality, which will stream an
HTTP response to the write buffer. The contents of the response are tracked in
the request's data struct, and these data structs are recycled between requests
using the same connection, preventing unnecessary allocations.
## Lander
Above the HTTP loop layer, we finally reach the code specific to Lander. It
might not surprise you that this layer is the smallest of the three, as the
abstractions below allow it to focus on the task at hand: serving and storing
HTTP redirects (and pastes). The way these are stored however is, in my
opinion, rather interesting.
For our Algorithms & Datastructures 3 course, we had to design three different
trie implementations in C: a Patricia trie, a ternary trie and a "custom" trie,
where we were allowed to experiment with different ideas. For those unfamiliar,
a trie is a tree-like datastructure used for storing strings. The keys used in
this tree are the strings themselves, with each character causing the tree to
branch off. Each string is stored at depth `m`, with `m` being the length of
the string. This also means that the search depth of a string is not bounded by
the size of the trie, but rather the size of the string! This allows for
extremely fast lookup times for short keys, even if we have a large number of
entries.
My design ended up being a combination of both a Patricia and a ternary trie: a
ternary trie that supports skips the way a Patricia trie does. I ended up
taking this final design and modifying it for this project by optimising it (or
at least try to) for shorter keys. This trie structure is stored completely in
memory, allowing for very low response times for redirects. Pastes are served
from disk, but their lookup is also performed using the same in-memory trie.
## What's next?
Hopefully the above explanation provides some insight into the inner workings
of Lander. For those interested, the source code is of course available
[here](https://git.rustybever.be/Chewing_Bever/lander). I'm not quite done with
this project though.
My current vision is to have Lander be my personal URL shortener, pastebin &
file-sharing service. Considering a pastebin is basically a file-sharing
service for text files specifically, I'd like to combine these into a single
concept. The goal is to rework the storage system to support arbitrarily large
files, and to allow storing generic metadata for each entry. The initial
usecase for this metadata would be storing the content type for uploaded files,
allowing this header to be correctly served when retrieving the files. This
combined with supporting large files turns Lander into a WeTransfer
alternative! Besides this, password protection and expiration of pastes is on
my to-do list as well. The data structure currently doesn't support removing
elements either, so this would need to be added in order to support expiration.
Hopefully a follow-up post announcing these changes will come soon ;)