S3 storage backend #10

Open
opened 2024-07-07 10:13:37 +02:00 by Jef Roosens · 0 comments

S3 storage can be a lot cheaper than regular storage when using a VPS or similar, so having it as an option for a storage backend could be useful.

The rust-s3 library seems to be well developed and fully functional. Using an FsProvider trait, we can implement a common file system-like interface that wraps the S3 API.

One important topic here is how we return data from the S3 store. S3 often has much higher bandwidths than any VPS, so ideally there would be the option to either proxy the S3 requests through the server or to serve redirects directly to the S3 resources. This would move the responsibility of serving the files entirely to the S3 store, without data having to flow through the Rieter instance. It might be better to look into implementing a Tower Service for this to abstract away the details of e.g. sending redirects.

Alongside this, I would recommend adding a config option to limit how many packages are in the parse queue at the same time. Libarchive needs the files locally, so a package in the queue needs to be stored locally on the server. Limiting the size of the parse queue would prevent newly uploaded packages from overwhelming the (possibly limited) disk size of the Rieter instance. Once the file has been offloaded to the S3 instance, another package can be added. We could perhaps also look at the sizes of the packages being uploaded and place a "max size on disk" cap on the queue, but this will be more complex to implement.

Alternatively, packages to parse could be uploaded directly to the S3 store using a pre-signed PUT request, with the Rieter server downloading each newly uploaded package when it wants to parse them. This wouldn't block the queue, but would slow down parsing depending on how fast the connection between S3 and the Rieter server is.

S3 storage can be a lot cheaper than regular storage when using a VPS or similar, so having it as an option for a storage backend could be useful. The [rust-s3](https://github.com/durch/rust-s3) library seems to be well developed and fully functional. Using an `FsProvider` trait, we can implement a common file system-like interface that wraps the S3 API. One important topic here is how we return data from the S3 store. S3 often has much higher bandwidths than any VPS, so ideally there would be the option to either proxy the S3 requests through the server or to serve redirects directly to the S3 resources. This would move the responsibility of serving the files entirely to the S3 store, without data having to flow through the Rieter instance. It might be better to look into implementing a [Tower Service](https://docs.rs/tower-service/0.3.2/tower_service/trait.Service.html) for this to abstract away the details of e.g. sending redirects. Alongside this, I would recommend adding a config option to limit how many packages are in the parse queue at the same time. Libarchive needs the files locally, so a package in the queue needs to be stored locally on the server. Limiting the size of the parse queue would prevent newly uploaded packages from overwhelming the (possibly limited) disk size of the Rieter instance. Once the file has been offloaded to the S3 instance, another package can be added. We could perhaps also look at the sizes of the packages being uploaded and place a "max size on disk" cap on the queue, but this will be more complex to implement. Alternatively, packages to parse could be uploaded directly to the S3 store using a pre-signed PUT request, with the Rieter server downloading each newly uploaded package when it wants to parse them. This wouldn't block the queue, but would slow down parsing depending on how fast the connection between S3 and the Rieter server is.
Jef Roosens added the
idea
label 2024-07-07 10:13:37 +02:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Chewing_Bever/rieter#10
There is no content yet.