Better database file design #53

Open
opened 2024-08-31 10:53:05 +02:00 by Jef Roosens · 0 comments

I'd like to change to CouchDB's database design.

  • Transactions are appended to the database file (existing data is never overwritten)
  • Each transaction is appended with a header marking the end of a transaction

This technique makes the file system resilient to crashes. If a crash occurs while a transaction is being written, its data is simply forgotten on startup. Only data appended by a valid header is part of the database.

This would remove the need for the index file, as we can simply read the database file from left to right on startup.

Updates to entries get handled the same way as inserts. On startup, we insert each entry into the in-memory trie, overwriting an already existing entry.

Of course, this sort of design should ideally go paired with some compact mechanism, as the file can only ever grow.

I'm also reconsidering storing all data inside the database file. This could allow the database to be a single file at all times, but it would mean the file could grow considerably if large data files are uploaded.

The very first thing that's written to the database is the database version. This covers my ass in case I ever want to redesign the database a third time and simplify migrations.

We could even take it a step further and use a page-based approach with a free list, similar to Sqlite. This would allow as to reuse parts of the file as needed. The page size would be written into the first page of the file, next to the database version. This would also allow resizing a datatabase as needed with different page sizes.

I'd like to change to CouchDB's [database design](https://docs.couchdb.org/en/stable/intro/overview.html). * Transactions are appended to the database file (existing data is never overwritten) * Each transaction is appended with a header marking the end of a transaction This technique makes the file system resilient to crashes. If a crash occurs while a transaction is being written, its data is simply forgotten on startup. Only data appended by a valid header is part of the database. This would remove the need for the index file, as we can simply read the database file from left to right on startup. Updates to entries get handled the same way as inserts. On startup, we insert each entry into the in-memory trie, overwriting an already existing entry. Of course, this sort of design should ideally go paired with some compact mechanism, as the file can only ever grow. I'm also reconsidering storing all data inside the database file. This could allow the database to be a single file at all times, but it would mean the file could grow considerably if large data files are uploaded. The very first thing that's written to the database is the database version. This covers my ass in case I ever want to redesign the database a third time and simplify migrations. We could even take it a step further and use a page-based approach with a free list, similar to [Sqlite](https://sqlite.org/fileformat2.html). This would allow as to reuse parts of the file as needed. The page size would be written into the first page of the file, next to the database version. This would also allow resizing a datatabase as needed with different page sizes.
Jef Roosens added the
Kind/Enhancement
Project/Lander
labels 2024-08-31 10:53:05 +02:00
Jef Roosens added
idea
and removed
Kind/Enhancement
labels 2024-09-05 15:16:02 +02:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Chewing_Bever/lander#53
There is no content yet.