Blocking tasks & worker threads #3

New Issue

Jef Roosens · 2024-01-30T08:05:07Z

Jef Roosens commented

2024-01-30 08:05:07 +00:00

There's currently no way for a handler to perform some blocking task without it
actively blocking its event loop thread. This of course isn't ideal, as it
completely cripples performance if multiple of these tasks need to be executed.

I think the best solution for this would be introducing the concept of worker
threads. When some blocking task needs to be performed, the handler schedules
task using e.g. lnm_loop_blocking_run, which takes the loop, the current
connection struct and a blocking function as arguments. This blocking function
shouldn't be able to fail; its error handling should be done in logging and its
results written away to the connection's context struct just like a regular
handler.

The route handler should notify the event loop that it's running a blocking
task, perhaps by adding another value to the route's possible results. If a
route signals it's currently running a blocking task, its connection file
descriptor will simply not be added back to the event loop. Because we're
already using one-shot epoll events, this is trivial to implement.

To process blocking tasks, a configurable number of worker threads should be
spawned that ingest blocking tasks from a concurrent queue of tasks. For each
received blocking task, the worker thread executes the handler function, and
afterwards re-adds the connection's file descriptor to the epoll loop, allowing
it to be picked up again by the event loop.

In terms of how this could be integrated into the current design of steps that
are executed one by one, we could introduce the concept of "blocking steps".
These would be added to the list of steps like any other, but would be handled
by the worker threads instead of being ran on the event loop. This might be the
most elegant way we could integrate these blocking tasks into the current
design.

To do

Implement concurrent worker queue
Add new states to signal to event loop work is blocking
Configurable amount of worker threads
Make sure everything works with 0 worker threads enabled (run stuff on event loop threads instead)

There's currently no way for a handler to perform some blocking task without it actively blocking its event loop thread. This of course isn't ideal, as it completely cripples performance if multiple of these tasks need to be executed. I think the best solution for this would be introducing the concept of worker threads. When some blocking task needs to be performed, the handler schedules task using e.g. `lnm_loop_blocking_run`, which takes the loop, the current connection struct and a blocking function as arguments. This blocking function shouldn't be able to fail; its error handling should be done in logging and its results written away to the connection's context struct just like a regular handler. The route handler should notify the event loop that it's running a blocking task, perhaps by adding another value to the route's possible results. If a route signals it's currently running a blocking task, its connection file descriptor will simply not be added back to the event loop. Because we're already using one-shot epoll events, this is trivial to implement. To process blocking tasks, a configurable number of worker threads should be spawned that ingest blocking tasks from a concurrent queue of tasks. For each received blocking task, the worker thread executes the handler function, and afterwards re-adds the connection's file descriptor to the epoll loop, allowing it to be picked up again by the event loop. In terms of how this could be integrated into the current design of steps that are executed one by one, we could introduce the concept of "blocking steps". These would be added to the list of steps like any other, but would be handled by the worker threads instead of being ran on the event loop. This might be the most elegant way we could integrate these blocking tasks into the current design. ## To do - [x] Implement concurrent worker queue - [x] Add new states to signal to event loop work is blocking - [x] Configurable amount of worker threads - [x] Make sure everything works with 0 worker threads enabled (run stuff on event loop threads instead)

Jef Roosens added the

enhancement

label 2024-01-30 08:05:07 +00:00

Jef Roosens commented

2024-01-30 08:35:27 +00:00

Poster

Instead of "spawning blocking tasks", the step processing pipeline could also
just be moved to a worker thread at some point if it knows it will require
blocking operations.

The whole point of the event loop is to efficiently interact with network
socket using non-blocking network I/O. However, if the processing of the step
at that point doesn't require any network I/O, there's no reason for it to be
ran on the event loop thread. This however doesn't mean that all work should
be run on the worker threads, as the context switch might take more time than
just executing on the event loop.

We could instead let steps be marked as needing network I/O or not, which would
then dictate on what thread they're run, allowing blocking tasks to be
transparently migrated to worker threads as needed.

Instead of "spawning blocking tasks", the step processing pipeline could also just be moved to a worker thread at some point if it knows it will require blocking operations. The whole point of the event loop is to efficiently interact with network socket using non-blocking network I/O. However, if the processing of the step at that point doesn't require any network I/O, there's no reason for it to be ran on the event loop thread. This however doesn't mean that *all* work should be run on the worker threads, as the context switch might take more time than just executing on the event loop. We could instead let steps be marked as needing network I/O or not, which would then dictate on what thread they're run, allowing blocking tasks to be transparently migrated to worker threads as needed.

Jef Roosens commented

2024-01-31 08:36:02 +00:00

Poster

It would be better to implement blocking work at the event loop level, not the
HTTP loop level. For this, we could introduce another event loop state:
blocking. The processing function sets this state to signal to the event loop
that its next execution should be ran on a worker thread, instead of on the
event loop thread. This also implies the processing function will not have
access to the event loop I/O for the duration of this state. If the event loop
state has switched back to one of the I/O states after having executed the
processing function, the worker thread re-enables the connection file
descriptor, allowing it to be picked up by the event loop.

It would be better to implement blocking work at the event loop level, not the HTTP loop level. For this, we could introduce another event loop state: `blocking`. The processing function sets this state to signal to the event loop that its next execution should be ran on a worker thread, instead of on the event loop thread. This also implies the processing function will not have access to the event loop I/O for the duration of this state. If the event loop state has switched back to one of the I/O states after having executed the processing function, the worker thread re-enables the connection file descriptor, allowing it to be picked up by the event loop.

Jef Roosens added reference blocking

2024-02-14 09:37:21 +00:00

Jef Roosens closed this issue

2024-02-20 21:16:10 +00:00

Sign in to join this conversation.