Concurrent LSM store #20

New Issue

Jef Roosens · 2023-11-16T17:22:41Z

Jef Roosens commented

2023-11-16 17:22:41 +00:00

There's currently no protection against two parallel insertions modifying the trie structure at the same place at the same time, e.g. two threads trying to add a new child node to the same node at the same time. This bug is currently not an issue, as the application is single-threaded, but I'd like to fix this for when I do want to add multithreading.

My only real idea for solving this issue right now would be adding a mutex to every single node in the trie, but this sounds very inefficient.

There's currently no protection against two parallel insertions modifying the trie structure at the same place at the same time, e.g. two threads trying to add a new child node to the same node at the same time. This bug is currently not an issue, as the application is single-threaded, but I'd like to fix this for when I do want to add multithreading. My only real idea for solving this issue right now would be adding a mutex to every single node in the trie, but this sounds very inefficient.

Jef Roosens added the

Kind/Bug

label 2023-11-16 17:22:41 +00:00

Jef Roosens referenced this issue

2023-11-16 17:25:48 +00:00

Multithreading #21

Jef Roosens commented

2023-12-08 21:59:05 +00:00

Poster

This issue boils down to a bigger issue I've been avoiding: updating the trie's internal structure while not requiring readers to lock the trie structure itself.

This article has given me some idea on how to accomplish this.

Instead of only keeping track of a single trie, we keep track of two. Readers can use the read map without locks, as it's immutable. Writers use the other map. Writing is serialized using a simple mutex. A writer first updates the writer trie after locking the mutex (note that this only applies to the trie's internal structure, and not the entries, as these are already locked themselves. The tries can simply share the entries). After updating the trie, the tries are swapped atomically. At this point, readers will start using the new trie.

Important here is that writers can only start writing to the write trie after all of its readers have stopped using it, to ensure we don't write to a trie that's still being read. After a write has been completed, the tries can be swapped immediately.

In order to track how many readers are currently reading the old trie, we could introduce some sort of reference-counting pointer, perhaps with a condition variable that is sent out if the value drops to 0.

Steps for writing:

Lock the write mutex
Wait until the write trie no longer has any readers
Update the trie as usual
Swap the tries atomatically

Steps for reading:

Read from the reader struct

The nice thing about this design is that it favors read speeds, which is ideal for Lander.

This issue boils down to a bigger issue I've been avoiding: updating the trie's internal structure while not requiring readers to lock the trie structure itself. [This article](https://questdb.io/blog/concurrent-lockfree-datastructure-design-walkthrough/) has given me some idea on how to accomplish this. Instead of only keeping track of a single trie, we keep track of two. Readers can use the read map without locks, as it's immutable. Writers use the other map. Writing is serialized using a simple mutex. A writer first updates the writer trie after locking the mutex (note that this only applies to the trie's internal structure, and not the entries, as these are already locked themselves. The tries can simply share the entries). After updating the trie, the tries are swapped atomically. At this point, readers will start using the new trie. Important here is that writers can only start writing to the write trie after all of its readers have stopped using it, to ensure we don't write to a trie that's still being read. After a write has been completed, the tries can be swapped immediately. In order to track how many readers are currently reading the old trie, we could introduce some sort of reference-counting pointer, perhaps with a condition variable that is sent out if the value drops to 0. Steps for writing: 1. Lock the write mutex 2. Wait until the write trie no longer has any readers 3. Update the trie as usual 4. Swap the tries atomatically Steps for reading: 1. Read from the reader struct The nice thing about this design is that it favors read speeds, which is ideal for Lander.

Jef Roosens changed title from ~~Possible race condition when inserting new entries~~ to Concurrent LSM store

2023-12-09 10:21:44 +00:00

Jef Roosens added

Kind/Enhancement

and removed

Kind/Bug

labels 2023-12-09 10:22:05 +00:00

Jef Roosens commented

2024-01-02 17:30:45 +00:00

Poster

A different approach is called "crab latching".

Each node have a rwlock. When reading an element from the trie, we traverse down the trie by read locking both the parent node and the child node. After having acquired the child node, we can release the parent node. We repeat this process until we reach our entry, which we already lock. Once we have the entry pointer, we can release the final child pointer.

For writing, we write-lock each node on the path to the entry, only releasing the locks if we know for certain the nodes on the path will not need to be updated by our operation. For our trie, this would be always, as the trie doesn't mutate its internal structure, meaning we could simply acquire a write lock for a single node at a time, releasing the previous one once the child node has been locked.

A different approach is called "crab latching". Each node have a rwlock. When reading an element from the trie, we traverse down the trie by read locking both the parent node and the child node. After having acquired the child node, we can release the parent node. We repeat this process until we reach our entry, which we already lock. Once we have the entry pointer, we can release the final child pointer. For writing, we write-lock each node on the path to the entry, only releasing the locks if we know for certain the nodes on the path will not need to be updated by our operation. For our trie, this would be always, as the trie doesn't mutate its internal structure, meaning we could simply acquire a write lock for a single node at a time, releasing the previous one once the child node has been locked.

Jef Roosens added the

Project/Lander

label 2024-08-26 08:42:18 +00:00

Sign in to join this conversation.