More speed by having two rails. Not always true. |
Improving Performance by Running Independent Tasks Concurrently - A Survey
Processes | Threads | Coroutines | |
---|---|---|---|
purpose | cpu-bound tasks | cpu- & i/o-bound tasks | i/o-bound tasks |
customizable | no | no | yes |
controllable | no | no | yes |
managed by | os scheduler | os scheduler + interpreter | event loop |
parallelism | yes | no | no |
switching | at any time | after any bytecode | at user-defined points |
shared state | no | yes | yes |
startup time | biggest/medium* | medium | smallest |
CPU overhead** | biggest | medium | smallest |
memory overhead | biggest | medium | smallest |
pool class | multiprocessing.Pool | multiprocessing.dummy.Pool | asyncio.BaseEventLoop |
solo class | multiprocessing.Process | threading.Thread | asyncio.coroutine |
* biggest - on Windows and if using 'spawn' ('fork'+'exec'); medium - if using 'fork' alone
** due to context switching
I started this little survey out of curiosity and professional needs to speed up our Python production systems. What I can tell from the experience gained in that field is that you basically need to ask yourself two questions:
- Does this code run faster with concurrency?
- Is the code cpu-bound or i/o-bound?
In the course of the thread, Steve Dower responded very ingeniously (here and here) which was my main driver to extend the survey. It explains certain values in the table quite easily, so I am going to quote him here for your convenience.
Steve Dower's "Let's Bake a Cake"
Let's say you are making a cake. There are two high-level steps involved:I further want to thank everybody who participated in discussing the matter and thus improved the survey enormously with a lot of patient explanations and insightful details.
You are personally required to do steps 1 and 2 ("hands-on"). They takes all of your time and attention and you can't do anything else simultaneously.
- Gather all the ingredients
- Mix all the ingredients
- Bake it in the oven
For step 3, you hand off the work to the oven. While the oven is baking, you are basically free to do other things.
In this analogy, "you" are the main thread and the oven is another thread. (Thread and process are interchangeable here in the general sense - the GIL in Python is practicality that makes processes preferable, but that doesn't affect the concepts.) Steps 1 and 2 are CPU bound (as far as "you" the main thread are concerned), and step 3 is IO bound from "your" (the main thread's) point-of-view.
Step 3 requires you to wait until it is complete:
This last option is most efficient for you, because you aren't interrupted at awkward times (i.e. greatly reduced need for locking on shared state) but you also don't have to walk all the way over to the oven to check whether it is done. You pause, listen, and get straight back to work if the oven is still going. That's the core feature of asyncio - not the networking or subprocess support - the ability to be notified efficiently that a task is complete without being interrupted by that notification.
- You can do a synchronous wait, by sitting and staring at the oven until it's done.
- You can poll, by occasionally interrupting yourself to walk over to the oven and see if it's done yet.
- You can use a signal/interrupt, when the oven is ready, regardless of whether you are ready to handle the interruption (but note: you know that the oven is done without having to walk over and check it).
- Or you can use asyncio, where you occasionally interrupt yourself and, when you do, the oven will make some noise if it has finished. (and if you never interrupt yourself, the oven never makes a sound)
Now let's expand this to making 3 cakes in parallel to see how "parallelism" works. Since there's so much going on, we'll create a TODO list:
(This means we've started three tasks to the current event loop. It's likely these are three external requests from clients, such as HTTP requests. It is possible, though not common in my experience, for production software to explicitly start with multiple tasks like this. More common is to have one task and a UI event loop that injects UI events as necessary.)
- Make cake #1
- Make cake #2
- Make cake #3
Task 1 is the obvious place to start, so we take that off the TODO list and start working on it. The steps to make cake #1 are:
Gathering ingredients is a synchronous operation (`def gather_ingredients()`) so we do that until we've gathered everything.
- Gather ingredients for cake #1
- Mix ingredients for cake #1
- Bake cake #1
Mixing ingredients is a long, interruptible operation (`async def mix_ingredients()`, with occasional explicit `await yield()` or whatever syntax was chosen for this), so we start mixing and then pause. When we pause, we put our current task on the TODO list:
We see that our next task is to make cake #2, so we repeat the steps above and eventually pause while we're mixing. Now the TODO list looks like:
- Make cake #2
- Make cake #3
- Continue mixing cake #1
And this continues. (Note that selecting which task to continue with is a detail of the event loop you're using. Check the spec to see whether some tasks have a higher priority or what order tasks are continued in. And bear in mind that so far, we've only used explicit yields - "I'm ready to do something else now if something needs doing".)
- Make cake #3
- Continue mixing cake #1
- Continue mixing cake #2
Eventually we will finish mixing one of the cakes, let's say it's cake #1. We will put it in the oven (`await put_in_oven()`) and then check the TODO list for what we should do next. There's nothing for us to do with cake #1, so our TODO list looks like:
Eventually, the oven will finish baking cake #1 and will add its own item to the TODO list:
- Continue mixing cake #2
- Continue mixing cake #3
When we take a break from mixing cake #2, we will continue mixing cake #3 (again, depending on your event loop's policy with regards to prioritisation). When we take a break from mixing cake #3, "Cake #1 is ready" will be the top of our TODO list and so we will continue with the statement following where we awaited it (it probably looked like `await put_in_oven(); remove_from_oven()` or maybe `baked_cake = await put_in_oven(mixed_ingredients)`).
- Continue mixing cake #2
- Continue mixing cake #3
- Cake #1 is ready
Eventually our TODO list will be empty, and so we will sit there waiting for something to appear on it (such as another incoming request, or an oven adding a "remove cake" item).
Processes and threads only really enter into asyncio as a "thing that can post messages back to my TODO list/event loop", while asyncio provides an efficient mechanism for interleaving (not parallelising) multiple tasks throughout an entire application (or a very significant self-contained piece of it). The parallelism only comes when all the main thread has to do for a particular task is wait, because another thread/process/service/device/etc. is doing the actual work.
-----"But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread)."
Because that is the wrong equality - it's really 1 baker = 1 thread.
Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time), and you will waste some of your own time coordinating them (interthread communication).
You also only have one set of baking equipment (the GIL), buying another bakery is expensive (another process) and fitting more equipment into the current one is very complicated (subinterpreters).
So you either pay a high price for 2 bakers = 2 cakes, or you accept 2 bakers = 1.5 cakes (in the same amount of time). It turns out that often 1 baker can do 1.5 cakes in the same time as well, and it's much easier to reason about and implement correctly.
There will be another post covering a module, I've written back then and which I maintain actively. It was and still is a way of digesting the concurrency matter in an attempt to improve usability and reduce boilerplate in Python.
Best,
Sven
No comments:
Post a Comment