Introduction
- Overview
- Concurrency :
- enables a computer to do many different things seemingly at the same time.
- multiple jobs to take turns accessing the same shared resources, likes disk, network, or a single CPU core.
- For I/O bounded tasks, Python offers two different mechanisms
- Parallelism :
- involves actually doing many different things at the same time.
- For CPU bounded tasks, Python offers multiprocessing
- multiprocessing
- subprocess
1. Threading
- Threads : are units of work where you can take one or more functions and execute them independently of the rest of the program. You can then aggregate the results, typically by waiting for all threads to run to completition
- Global Interptreter Lock (GIL) : python threads can’t run in parallel on multiple CPU cores because of the GIL.
- Advantages
- Python threads are still useful despite the GIL, because they provide an easy way to do multiple things seemingly at the same time.
- Use Python threads to make multiple system calls in parallel. This allows you to do blocking I/O at the same time as computation (things like reading and writing files, interacting with networks, communicating with devices like displays, and so on.)
- Disadvatages
- Threads are cooperative. The Python runtime divides its attention between them, so that objects accessed by threads can be managed correctly.
- As a result, threads shouldn’t be used for CPU-intensive work. If you run a CPU-intensive operation in a thread, it will be paused when the runtime switches to another thread, so there will be no performance benefit over running that operation outside of a thread.
import requests
from concurrent.futures import as_completed, wait, ThreadPoolExecutor
def io_bounded(url):
return requests.get(url).content
urls = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
with ThreadPoolExecutor(max_workers=5) as ex:
url_to_futures = {url:ex.submit(io_bounded, url) for url in urls}
futures = url_to_futures.values()
# Use "wait" for ordered result or as_completed for not-ordered results
results = wait(futures, timeout=0.3)
print(url_to_futures.get('http://www.foxnews.com/').result()[:1000])
2. Asyncio
- Coroutines : are a different way to execute functions concurrently in Python
3. Multiprocessing
- Multiprocessing : allows you to run many CPU-intensive tasks side by side by launching multiple, independent copies of the Python runtime.
- Advantages : With threading and coroutines, the Python runtime forces all operations to run serially. Multiprocessing sidesteps this limitation by giving each operation a separate Python runtime and a full CPU core.
- Disadvantages :
- Additional overhead is associated with creating the processes
- Each subprocess needs to have a copy of the data it works with sent to it from the main process. (related to pickle)
import time
import multiprocessing
# If the number of process is not given, cpu_count will be used as the default
# The number of processes running concurrently on your computer is not limited by the number of cores.
NUM_OF_PROC = multiprocessing.cpu_count()
def cpu_bounded(input_data):
time.sleep(1)
return "result"
with multiprocessing.Pool(NUM_OF_PROC) as p:
outputs = p.map(cpu_bounded, inputs)
5. References