fix rss feeds; move stuff around
This commit is contained in:
parent
cd10df9d32
commit
323b9e2e3c
9 changed files with 16 additions and 60 deletions
|
|
@ -1,142 +0,0 @@
|
|||
---
|
||||
title: "Automating Minecraft Server Backups"
|
||||
date: 2023-09-07
|
||||
---
|
||||
|
||||
I started playing Minecraft back in 2012, after the release of version 1.2.5.
|
||||
Like many gen Z'ers, I grew up playing the game day in day out, and now 11
|
||||
years later, I love the game more than ever. One of the main reasons I still
|
||||
play the game is multiplayer, seeing the world evolve as the weeks go by with
|
||||
everyone adding their own personal touches.
|
||||
|
||||
Naturally, as a nerd, I've grown the habit of hosting my own servers, as well
|
||||
as maintaining instances for friends. Having managed these servers, I've
|
||||
experienced the same problems that I've heard other people complaining about as
|
||||
well: backing up the server.
|
||||
|
||||
{{< figure src="./the-village.jpg" title="Sneak peak of the village we live in" >}}
|
||||
|
||||
## The Problem
|
||||
|
||||
Like any piece of software, a Minecraft server instance writes files to disk,
|
||||
and these files, a combination of world data and configuration files, are what
|
||||
we wish to back up. The problem is that the server instance is constantly
|
||||
writing new data to disk. This conflicts with the "just copy the files"
|
||||
approach (e.g. `tar` or `rsync`), as these will often encounter errors because
|
||||
they're trying to read a file that's actively being written to. Because the
|
||||
server isn't aware it's being backed up, it's also possible it writes to a file
|
||||
already read by the backup software while the other files are still being
|
||||
processed. This produces an inconsistent backup with data files that do not
|
||||
properly belong together.
|
||||
|
||||
There are two straightforward ways to solve this problem. One would be to
|
||||
simply turn off the server before each backup. While this could definitely work
|
||||
without too much interruption, granted the backups are scheduled at times no
|
||||
players are online, I don't find this to be very elegant.
|
||||
|
||||
The second solution is much more appealing. A Minecraft server can be
|
||||
controlled using certain console commands, with the relevant ones here being
|
||||
`save-off`, `save-all`, and `save-on`. `save-off` tells the server to stop
|
||||
saving its data to disk, and cache it in memory instead. `save-all` flushes the
|
||||
server's data to disk, and `save-on` enables writing to disk again. Combining
|
||||
these commands provides us with a way to back up a live Minecraft server: turn
|
||||
off saving using `save-off`, flush its data using `save-all`, back up the
|
||||
files, and turn on saving again using `save-on`. With these tools at my
|
||||
disposal, I started work on my own custom solution.
|
||||
|
||||
## My solution
|
||||
|
||||
After some brainstorming, I ended up with a fairly simple approach: spawn the
|
||||
server process as a child process with the parent controlling the server's
|
||||
stdin. By taking control of the stdin, we can send commands to the server
|
||||
process as if we'd typed them into the terminal ourselves. I wrote the original
|
||||
proof-of-concept over two years ago during the pandemic, but this ended up
|
||||
sitting in a dead repository afterwards. However, a couple of months ago, some
|
||||
new motivation to work on the project popped into my head (I started caring a
|
||||
lot about our world), so I turned it into a fully fletched backup tool! The
|
||||
project's called [alex](https://git.rustybever.be/Chewing_Bever/alex) and as
|
||||
usual, it's open-source and available on my personal Gitea instance.
|
||||
|
||||
Although Alex is a lot more advanced now than it was a couple of months back,
|
||||
it still functions on the same principle of injecting the above commands into
|
||||
the server process's stdin. The real star of the show however is the way it
|
||||
handles its backups, which brings us into the next section.
|
||||
|
||||
## Incremental backups
|
||||
|
||||
You could probably describe my usual projects as overengineered, and Alex is no
|
||||
different. Originally, Alex simply created a full tarball every `n` minutes
|
||||
(powered by the lovely [tar-rs](https://github.com/alexcrichton/tar-rs)
|
||||
library). While this definitely worked, it was *slow*. Compressing several
|
||||
gigabytes of world files always takes some time, and this combined with shaky
|
||||
hard drive speeds resulted in backups easily taking 5-10 minutes. Normally,
|
||||
this wouldn't bother me too much, but with this solution, the Minecraft server
|
||||
isn't writing to disk for the entire duration of this backup! If the server
|
||||
crashed during this time, all this data would be lost.
|
||||
|
||||
This called for a better method: incremental backups. For those unfamiliar, an
|
||||
incremental backup is a backup that only stores the changes that occurred since
|
||||
the last backup. This not only saves a ton of disk space, but it also greatly
|
||||
decreases the amount of data that needs to be compressed, speeding up the
|
||||
backup process tremendously.
|
||||
|
||||
Along with this, I introduced the concept of "chains". Because an incremental
|
||||
backup describes the changes that occurred since the last backup, it needs that
|
||||
other backup in order to be fully restored. This also implies that the first
|
||||
incremental backup needs to be based off a full backup. A chain defines a list
|
||||
of sequential backups that all depend on the one before them, with each chain
|
||||
starting with a full backup.
|
||||
|
||||
All of this combined resulted in the following configuration for backups: the
|
||||
admin can configure one or more backup schedules, with each schedule being
|
||||
defined by a name, a frequency, a chain length and how many chains to keep. For
|
||||
each of these configurations, a new backup will be created periodically
|
||||
according to the defined frequency, and this backup will be appended to the
|
||||
current chain for that schedule. If the chain is full (as defined by the chain
|
||||
length), a new chain is created. Finally, the admin can configure how many of
|
||||
these full chains to keep.
|
||||
|
||||
As an example, my server currently uses a dual-schedule system:
|
||||
|
||||
* One configuration is called "30min". As the name suggests, it has a frequency
|
||||
of 30 minutes. It stores chains of length 48, and keeps 1 full chain. This
|
||||
configuration allows me to create incremental backups (which take 5-10
|
||||
seconds) every 30 minutes, and I can restore these backups in this 30-minute
|
||||
granularity up to 24 hours back.
|
||||
* The second configuration is called "daily", and this one simply creates a
|
||||
full backup (a chain length of 1) every 24 hours, with 7 chains being stored.
|
||||
This allows me to roll back a backup with a 24-hour granularity up to 7 days
|
||||
back.
|
||||
|
||||
This configuration would've never been possible without incremental backups, as
|
||||
the 30 minute backups would've simply taken too long otherwise. The required
|
||||
disk space would've also been rather unwieldy, as I'd rather not store 48
|
||||
multi-gigabyte backups per day. With the incremental backups system, each
|
||||
backup after the initial full backup is only a few megabytes!
|
||||
|
||||
Of course, a tool like this wouldn't be complete without some management
|
||||
utilities, so the Alex binary contains tools for restoring backups, exporting
|
||||
incremental backups as a new full backup, and unpacking a backup.
|
||||
|
||||
## What's next?
|
||||
|
||||
There's still some improvements I'd like to add to Alex itself, notably making
|
||||
Alex more aware of the server's internal state by parsing its logs, and making
|
||||
restoring backups possible without having to stop the Alex instance (this is
|
||||
rather cumbersome in Docker containers).
|
||||
|
||||
On a bigger scale however, there's another possible route to take: add a
|
||||
central server component where an Alex instance can publish its backups to.
|
||||
This server would then have a user management system to allow certain users of
|
||||
the Minecraft server to have access to the backups for offline use. This server
|
||||
could perhaps also show the logs of the server instance, as well as handling
|
||||
syncing the backups to another location, such as an S3 store. This would make
|
||||
the entire system more resistant to data loss.
|
||||
|
||||
Of course, I'm well aware these ideas are rather ambitious, but I'm excited to
|
||||
see where this project might go next!
|
||||
|
||||
That being said, Alex is available as statically compiled binaries for `amd64`
|
||||
and `arm64` [on my Gitea](https://git.rustybever.be/Chewing_Bever/alex). If
|
||||
you're interested in following the project, Gitea recently added repository
|
||||
[RSS feeds](https://git.rustybever.be/Chewing_Bever/alex.rss) ;)
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 593 KiB |
|
|
@ -1,138 +0,0 @@
|
|||
---
|
||||
title: "Designing my own URL shortener"
|
||||
date: 2023-10-14
|
||||
---
|
||||
|
||||
One of the projects I've always found to be a good choice for a side project is
|
||||
a URL shortener. The core idea is simple and fairly easily to implement, yet it
|
||||
allows for a lot of creativity in how you implement it. Once you're done with
|
||||
the core idea, you can start expanding the project as you wish: expiring links,
|
||||
password protection, or perhaps a management API. The possibilities are
|
||||
endless!
|
||||
|
||||
Naturally, this post talks about my own version of a URL shortener:
|
||||
[Lander](https://git.rustybever.be/Chewing_Bever/lander). In order to add some
|
||||
extra challenge to the project, I've chosen to write it from the ground up in C
|
||||
by implementing my own event loop, and building an HTTP server on top to use as
|
||||
the base for the URL shortener.
|
||||
|
||||
## The event loop
|
||||
|
||||
Lander consists of three layers: the event loop, the HTTP loop and finally the
|
||||
Lander-specific code. Each of these layers utilizes the layer below it, with
|
||||
the event loop being the bottom-most layer. This layer directly interacts with
|
||||
the networking stack and ensures bytes are received from and written to the
|
||||
client. The book [Build Your Own Redis](https://build-your-own.org/redis/) by
|
||||
James Smith was an excellent starting point, and I highly recommend checking it
|
||||
out! This book taught me everything I needed to know to start this project.
|
||||
|
||||
Now for a slightly more techical dive into the inner workings of the event
|
||||
loop. The event loop is the layer that listens on the listening TCP socket for
|
||||
incoming connections and directly processes requests. In each iteration of the
|
||||
event loop, the following steps are taken:
|
||||
|
||||
1. For each of the open connections:
|
||||
1. Perform network I/O
|
||||
2. Execute data processing code, provided by the upper layers
|
||||
3. Close finished connections
|
||||
2. Accept a new connection if needed
|
||||
|
||||
The event loop runs on a single thread, and constantly goes through this cycle
|
||||
to process requests. Here, the "data processing code" is a set of function
|
||||
pointers passed to the event loop that get executed at specific times. This is
|
||||
how the HTTP loop is able to inject its functionality into the event loop.
|
||||
|
||||
In the event loop, a connection can be in one of three states: `request`,
|
||||
`response`, or `end`. In `request` mode, the event loop tries to read incoming
|
||||
data from the client into the read buffer. This read buffer is then used by the
|
||||
data processing code's data handler. In `response` mode, the data processing
|
||||
code's data writer is called, which populates the write buffer. This buffer is
|
||||
then written to the connection socket. Finally, the `end` state simply tells
|
||||
the event loop that the connection should be closed without any further
|
||||
processing. A connection can switch between `request` and `response` mode as
|
||||
many times as needed, allowing connections to be reused for multiple requests
|
||||
from the same client.
|
||||
|
||||
The event loop provides all the necessary building blocks needed to build a
|
||||
client-server type application. These are then used to implement the next
|
||||
layer: the HTTP loop.
|
||||
|
||||
## The HTTP loop
|
||||
|
||||
Before we can design a specific HTTP-based application, we need a base to build
|
||||
on. This base is the HTTP loop. It handles both serializing and deserializing
|
||||
of HTTP requests & responses, along with providing commonly used functionality,
|
||||
such as bearer authentication and reading & writing files to & from disk. The
|
||||
request parser is provided by the excellent
|
||||
[picohttpparser](https://github.com/h2o/picohttpparser) library. The parsed
|
||||
request is stored in the request's data struct, providing access to this data
|
||||
for all necessary functions.
|
||||
|
||||
The HTTP loop defines a request handler function which is passed to the event
|
||||
loop as the data handler. This function first tries to parse the request,
|
||||
before routing it accordingly. For routing, literal string matches or
|
||||
RegEx-based routing is available.
|
||||
|
||||
Each route consists of one or more steps. Each of these steps is a function
|
||||
that tries to advance the processing of the current request. The return value
|
||||
of these steps tells the HTTP loop whether the step has finished its task or if
|
||||
it's still waiting for I/O. The latter instructs the HTTP loop to skip this
|
||||
request for now, delaying its processing until the next cycle of the HTTP loop.
|
||||
In each cycle of the HTTP loop (or rather, the event loop), a request will try
|
||||
to advance its processing by as much as possible by executing as many steps as
|
||||
possible, in order. This means that very small requests can be completely
|
||||
processed within a single cycle of the HTTP loop. Common functionality is
|
||||
provided as predefined steps. One example is the `http_loop_step_body_to_buf`
|
||||
step, which reads the request body into a buffer.
|
||||
|
||||
The HTTP loop also provides the data writer functionality, which will stream an
|
||||
HTTP response to the write buffer. The contents of the response are tracked in
|
||||
the request's data struct, and these data structs are recycled between requests
|
||||
using the same connection, preventing unnecessary allocations.
|
||||
|
||||
## Lander
|
||||
|
||||
Above the HTTP loop layer, we finally reach the code specific to Lander. It
|
||||
might not surprise you that this layer is the smallest of the three, as the
|
||||
abstractions below allow it to focus on the task at hand: serving and storing
|
||||
HTTP redirects (and pastes). The way these are stored however is, in my
|
||||
opinion, rather interesting.
|
||||
|
||||
For our Algorithms & Datastructures 3 course, we had to design three different
|
||||
trie implementations in C: a Patricia trie, a ternary trie and a "custom" trie,
|
||||
where we were allowed to experiment with different ideas. For those unfamiliar,
|
||||
a trie is a tree-like datastructure used for storing strings. The keys used in
|
||||
this tree are the strings themselves, with each character causing the tree to
|
||||
branch off. Each string is stored at depth `m`, with `m` being the length of
|
||||
the string. This also means that the search depth of a string is not bounded by
|
||||
the size of the trie, but rather the size of the string! This allows for
|
||||
extremely fast lookup times for short keys, even if we have a large number of
|
||||
entries.
|
||||
|
||||
My design ended up being a combination of both a Patricia and a ternary trie: a
|
||||
ternary trie that supports skips the way a Patricia trie does. I ended up
|
||||
taking this final design and modifying it for this project by optimising it (or
|
||||
at least try to) for shorter keys. This trie structure is stored completely in
|
||||
memory, allowing for very low response times for redirects. Pastes are served
|
||||
from disk, but their lookup is also performed using the same in-memory trie.
|
||||
|
||||
## What's next?
|
||||
|
||||
Hopefully the above explanation provides some insight into the inner workings
|
||||
of Lander. For those interested, the source code is of course available
|
||||
[here](https://git.rustybever.be/Chewing_Bever/lander). I'm not quite done with
|
||||
this project though.
|
||||
|
||||
My current vision is to have Lander be my personal URL shortener, pastebin &
|
||||
file-sharing service. Considering a pastebin is basically a file-sharing
|
||||
service for text files specifically, I'd like to combine these into a single
|
||||
concept. The goal is to rework the storage system to support arbitrarily large
|
||||
files, and to allow storing generic metadata for each entry. The initial
|
||||
usecase for this metadata would be storing the content type for uploaded files,
|
||||
allowing this header to be correctly served when retrieving the files. This
|
||||
combined with supporting large files turns Lander into a WeTransfer
|
||||
alternative! Besides this, password protection and expiration of pastes is on
|
||||
my to-do list as well. The data structure currently doesn't support removing
|
||||
elements either, so this would need to be added in order to support expiration.
|
||||
|
||||
Hopefully a follow-up post announcing these changes will come soon ;)
|
||||
Loading…
Add table
Add a link
Reference in a new issue