I think the experiment is misguided for several reasons. One, process/thread creation time is negligible. The general approach is to create worker threads/processes that live for the lifetime of the program. Then you farm work out to them as needed. This separates the concept of "doing work" from their actual execution.
Two, threads don't buy you parallelism in Python, unless the majority of the work is being done in C modules.
Finally, this test is really just testing the multiprocess and thread packages provided by Python. I say this is misguided because the way the author talks about it, I don't think he understands that the difference between those abstractions and OS threads and processes. (Which, of course, are an abstraction as well.) I suspect the Python overhead will be more than the difference in cost between forking OS-level threads and processes.
Two, threads don't buy you parallelism in Python, unless the majority of the work is being done in C modules.
Finally, this test is really just testing the multiprocess and thread packages provided by Python. I say this is misguided because the way the author talks about it, I don't think he understands that the difference between those abstractions and OS threads and processes. (Which, of course, are an abstraction as well.) I suspect the Python overhead will be more than the difference in cost between forking OS-level threads and processes.