Tuesday, March 1, 2016

Designing xfork

Recently, I came to know a small team working on a problem which they try to solve by using threads. As expected, problems popped up soon and development slowed down considerably. So based on the previous post, I would like to lay out my intentions and design decisions regarding xfork, a module I've written and actively maintain in analysis of the newly introduced async/await syntax.

Concurrency is a hard engineering problem.

Take it seriously and even consider not being concurrent a valid option.

Design Assumptions

I created xfork from the following observations based on my own experience. Developers usually:

  1. don't understand 100% of the problem's domain.
  2. need a simple approach to get things right.
  3. understand code written in a sequential style.
  4. don't know what environment their code is running on such as:
    1. How many cores has the target system?
    2. How many processes are allowed on/would wrestle the target system down?
    3. How much memory has the target system?
    4. How often will the code be re-used and re-executed?
Each observation will be addressed by a following section.

Background Tasks

The observation 1 stems from some pretty basic human property. So, let me put it bluntly:
  • we don't want processes
  • we don't want threads
  • we don't want coroutines
What we really want is faster execution. Parallel (or at least concurrent) execution is just a means to an end here. In turn, processes, threads and coroutines are just a means to parallel execution. So, we better build some abstraction which is actually closer to the developers problem: faster execution.

Let's start by calling units of execution which can run independently "background tasks" or simply "tasks".

Task Hierarchy

In order to address assumption 2, something to structure a collection of tasks is needed.

A software developer is just a normal guy who needs simple solutions for his job. Something that has emerged several times throughout of human history are hierarchies. As humans are concerned, they understand hierarchies pretty well. Most companies are structured this way, your folder and files system is probably a hierarchical one, as is your governmental system or the process tree of your computer, tablet or smartphone.

To put it simply, a hierarchy is a layered system—so you only care about the layer above and below you—and one layer is represented by a single representative—so you greatly simplify the communication to the layers above and below. These two properties made hierarchies quite successful so far.

This said, we go with a hierarchical system for now when it comes to concurrency. That means, there is one task managing a bunch of independent and similar tasks. Managing basically subsumes task creation, result collection and result processing.

Functions as Tasks

xfork has been designed to address observation 3 and to take the warning from the beginning seriously. A main goal was to make hopping back and forth from sequential to concurrent style of programming as easy as possible.

The most basic concept, developers usually understand are functions. Thus, they act as kind of a bridge between the two worlds. A function can be executed either by waiting for its result (sequential style) or by submitting it to a background worker and requesting its result at a later point (concurrent style).
This will especially be clear when working with a large legacy code-base. You might finally consider using concurrent approaches to speed things up but a complete rewrite is out of question. One does not simply throw away large collection of already working functions.

Task Management

This job should be done for you by xfork. It should take care of the question whether to create a thread or a process for a background task. Moreover, the number of processes and threads needs to be managed without developer interaction by creating and closing them down for you on the fly and according to the machines capabilities.

When should a task be a process, when should it be a thread and when should it be implemented as a coroutine running in an event loop? The last post gives some pretty simple explanation for this. Processes utilize the multicore architectures of today's computers, so are suitable for CPU-bound tasks. Coroutines are designed to wait for I/O efficiently, so I/O-bound tasks are their use-cases. Threads are located somewhere in the middle especially when it comes to the GIL of CPython. So right now, they apply for the I/O-bound side of tasks.

This said, the main exercise for a developer using xfork is actually thinking of whether their function is I/O-bound or CPU-bound and whether it is thread-safe or not.

Conclusion

All observations being addressed, I think it's time to make a break. A next post will investigate the current implementation of xfork.

Best,
Sven

No comments:

Post a Comment