site/content/posts/alex/index.md

7.8 KiB

title date
Automating Minecraft Server Backups 2023-09-07

I started playing Minecraft back in 2012, after the release of version 1.2.5. Like many gen Z'ers, I grew up playing the game day in day out, and now 11 years later, I love the game more than ever. One of the main reasons I still play the game is multiplayer, seeing the world evolve as the weeks go by with everyone adding their own personal touches.

Naturally, as a nerd, I've grown the habit of hosting my own servers, as well as maintaining instances for friends. Having managed these servers, I've experienced the same problems that I've heard other people complaining about as well: backing up the server.

{{< figure src="./the-village.jpg" title="Sneak peak of the village we live in" >}}

The Problem

Like any piece of software, a Minecraft server instance writes files to disk, and these files, a combination of world data and configuration files, are what we wish to back up. The problem is that the server instance is constantly writing new data to disk. This conflicts with the "just copy the files" approach (e.g. tar or rsync), as these will often encounter errors because they're trying to read a file that's actively being written to. Because the server isn't aware it's being backed up, it's also possible it writes to a file already read by the backup software while the other files are still being processed. This produces an inconsistent backup with data files that do not properly belong together.

There are two straightforward ways to solve this problem. One would be to simply turn off the server before each backup. While this could definitely work without too much interruption, granted the backups are scheduled at times no players are online, I don't find this to be very elegant.

The second solution is much more appealing. A Minecraft server can be controlled using certain console commands, with the relevant ones here being save-off, save-all, and save-on. save-off tells the server to stop saving its data to disk, and cache it in memory instead. save-all flushes the server's data to disk, and save-on enables writing to disk again. Combining these commands provides us with a way to back up a live Minecraft server: turn off saving using save-off, flush its data using save-all, back up the files, and turn on saving again using save-on. With these tools at my disposal, I started work on my own custom solution.

My solution

After some brainstorming, I ended up with a fairly simple approach: spawn the server process as a child process with the parent controlling the server's stdin. By taking control of the stdin, we can send commands to the server process as if we'd typed them into the terminal ourselves. I wrote the original proof-of-concept over two years ago during the pandemic, but this ended up sitting in a dead repository afterwards. However, a couple of months ago, some new motivation to work on the project popped into my head (I started caring a lot about our world), so I turned it into a fully fletched backup tool! The project's called alex and as usual, it's open-source and available on my personal Gitea instance.

Although Alex is a lot more advanced now than it was a couple of months back, it still functions on the same principle of injecting the above commands into the server process's stdin. The real star of the show however is the way it handles its backups, which brings us into the next section.

Incremental backups

You could probably describe my usual projects as overengineered, and Alex is no different. Originally, Alex simply created a full tarball every n minutes (powered by the lovely tar-rs library). While this definitely worked, it was slow. Compressing several gigabytes of world files always takes some time, and this combined with shaky hard drive speeds resulted in backups easily taking 5-10 minutes. Normally, this wouldn't bother me too much, but with this solution, the Minecraft server isn't writing to disk for the entire duration of this backup! If the server crashed during this time, all this data would be lost.

This called for a better method: incremental backups. For those unfamiliar, an incremental backup is a backup that only stores the changes that occurred since the last backup. This not only saves a ton of disk space, but it also greatly decreases the amount of data that needs to be compressed, speeding up the backup process tremendously.

Along with this, I introduced the concept of "chains". Because an incremental backup describes the changes that occurred since the last backup, it needs that other backup in order to be fully restored. This also implies that the first incremental backup needs to be based off a full backup. A chain defines a list of sequential backups that all depend on the one before them, with each chain starting with a full backup.

All of this combined resulted in the following configuration for backups: the admin can configure one or more backup schedules, with each schedule being defined by a name, a frequency, a chain length and how many chains to keep. For each of these configurations, a new backup will be created periodically according to the defined frequency, and this backup will be appended to the current chain for that schedule. If the chain is full (as defined by the chain length), a new chain is created. Finally, the admin can configure how many of these full chains to keep.

As an example, my server currently uses a dual-schedule system:

  • One configuration is called "30min". As the name suggests, it has a frequency of 30 minutes. It stores chains of length 48, and keeps 1 full chain. This configuration allows me to create incremental backups (which take 5-10 seconds) every 30 minutes, and I can restore these backups in this 30-minute granularity up to 24 hours back.
  • The second configuration is called "daily", and this one simply creates a full backup (a chain length of 1) every 24 hours, with 7 chains being stored. This allows me to roll back a backup with a 24-hour granularity up to 7 days back.

This configuration would've never been possible without incremental backups, as the 30 minute backups would've simply taken too long otherwise. The required disk space would've also been rather unwieldy, as I'd rather not store 48 multi-gigabyte backups per day. With the incremental backups system, each backup after the initial full backup is only a few megabytes!

Of course, a tool like this wouldn't be complete without some management utilities, so the Alex binary contains tools for restoring backups, exporting incremental backups as a new full backup, and unpacking a backup.

What's next?

There's still some improvements I'd like to add to Alex itself, notably making Alex more aware of the server's internal state by parsing its logs, and making restoring backups possible without having to stop the Alex instance (this is rather cumbersome in Docker containers).

On a bigger scale however, there's another possible route to take: add a central server component where an Alex instance can publish its backups to. This server would then have a user management system to allow certain users of the Minecraft server to have access to the backups for offline use. This server could perhaps also show the logs of the server instance, as well as handling syncing the backups to another location, such as an S3 store. This would make the entire system more resistant to data loss.

Of course, I'm well aware these ideas are rather ambitious, but I'm excited to see where this project might go next!

That being said, Alex is available as statically compiled binaries for amd64 and arm64 on my Gitea. If you're interested in following the project, Gitea recently added repository RSS feeds ;)