You are currently viewing The most effective Python libraries for parallel processing

The most effective Python libraries for parallel processing

Python is highly effective, versatile, and programmer-friendly, however it isn’t the quickest programming language round. A few of Python’s velocity limitations are because of its default implementation, CPython, being single-threaded. That’s, CPython doesn’t use multiple {hardware} thread at a time.

And whereas you should utilize Python’s built-in threading module to hurry issues up, threading solely offers you concurrency, not parallelism. It’s good for working a number of duties that aren’t CPU-dependent, however does nothing to hurry up a number of duties that every require a full CPU. This will likely change sooner or later, however for now, it’s finest to imagine threading in Python gained’t offer you parallelism.

Python does embody a local approach to run a workload throughout a number of CPUs. The multiprocessing module spins up a number of copies of the Python interpreter, every on a separate core, and gives primitives for splitting duties throughout cores. However generally even multiprocessing isn’t sufficient.

In some instances, the job requires distributing work not solely throughout a number of cores, but in addition throughout a number of machines. That’s the place the Python libraries and frameworks launched on this article are available in. Listed below are seven frameworks you should utilize to unfold an current Python utility and its workload throughout a number of cores, a number of machines, or each.

The most effective Python libraries for parallel processing

  • Ray – parallelizes and distributes AI and machine studying workloads throughout CPUs, machines, and GPUs
  • Dask – parallelizes Python information science libraries corresponding to NumPy, Pandas, and Scikit-learn
  • Dispy – executes computations in parallel throughout a number of processors or machines
  • Pandaral•lel – parallelizes Pandas throughout a number of CPUs
  • Ipyparallel – permits interactive parallel computing with IPython, Jupyter Pocket book, and Jupyter Lab
  • Joblib – executes computations in parallel, with optimizations for NumPy and clear disk caching of capabilities and output values
  • Parsl – helps parallel execution throughout a number of cores and machines, together with chaining capabilities collectively into multi-step workflows


Developed by a staff of researchers on the College of California, Berkeley, Ray underpins numerous distributed machine studying libraries. However Ray isn’t restricted to machine studying duties alone, even when that was its authentic use case. You’ll be able to break up and distribute any sort of Python activity throughout a number of techniques with Ray.

Ray’s syntax is minimal, so that you don’t want to transform current functions extensively to parallelize them. The @ray.distant decorator distributes that operate throughout any accessible nodes in a Ray cluster, with optionally specified parameters for what number of CPUs or GPUs to make use of. The outcomes of every distributed operate are returned as Python objects, in order that they’re straightforward to handle and retailer, and the quantity of copying throughout or inside nodes is minimal. This final characteristic is useful when coping with NumPy arrays, as an example.

Ray even contains its personal built-in cluster supervisor, which might routinely spin up nodes as wanted on native {hardware} or widespread cloud computing platforms. Different Ray libraries allow you to scale widespread machine studying and information science workloads, so that you don’t need to manually scaffold them. As an illustration, Ray Tune enables you to carry out hyperparameter turning at scale for commonest machine studying techniques (PyTorch and TensorFlow, amongst others).


From the skin, Dask appears loads like Ray. It, too, is a library for distributed parallel computing in Python, with its personal activity scheduling system, consciousness of Python information frameworks like NumPy, and the power to scale from one machine to many.

One key distinction between Dask and Ray is the scheduling mechanism. Dask makes use of a centralized scheduler that handles all duties for a cluster. Ray is decentralized, which means every machine runs its personal scheduler, so any points with a scheduled activity are dealt with on the stage of the person machine, not the entire cluster. Dask’s activity framework works hand-in-hand with Python’s native concurrent.futures interfaces, so for many who’ve used that library, a lot of the metaphors for a way jobs work ought to be acquainted.

Dask works in two primary methods. The primary is by means of parallelized information constructions—primarily, Dask’s personal variations of NumPy arrays, lists, or Pandas DataFrames. Swap within the Dask variations of these constructions for his or her defaults, and Dask will routinely unfold their execution throughout your cluster. This sometimes entails little greater than altering the title of an import, however could generally require rewriting to work fully.

The second approach is thru Dask’s low-level parallelization mechanisms, together with operate decorators, that parcel out jobs throughout nodes and return outcomes synchronously (in “rapid” mode) or asynchronously (“lazy” mode). Each modes could be blended as wanted.

Dask additionally provides a characteristic referred to as actors. An actor is an object that factors to a job on one other Dask node. This fashion, a job that requires a whole lot of native state can run in-place and be referred to as remotely by different nodes, so the state for the job doesn’t need to be replicated. Ray lacks something like Dask’s actor mannequin to assist extra refined job distribution. Nevertheless, Desk’s scheduler isn’t conscious of what actors do, so if an actor runs wild or hangs, the scheduler can’t intercede. “Excessive-performing however not resilient” is how the documentation places it, so actors ought to be used with care.


Dispy enables you to distribute entire Python packages or simply particular person capabilities throughout a cluster of machines for parallel execution. It makes use of platform-native mechanisms for community communication to maintain issues quick and environment friendly, so Linux, macOS, and Home windows machines work equally effectively. That makes it a extra generic resolution than others mentioned right here, so it’s value a glance in case you want one thing that isn’t particularly about accelerating machine-learning duties or a specific data-processing framework.

Dispy syntax considerably resembles multiprocessing in that you simply explicitly create a cluster (the place multiprocessing would have you ever create a course of pool), submit work to the cluster, then retrieve the outcomes. A little bit extra work could also be required to change jobs to work with Dispy, however you additionally acquire exact management over how these jobs are dispatched and returned. As an illustration, you possibly can return provisional or partially accomplished outcomes, switch recordsdata as a part of the job distribution course of, and use SSL encryption when transferring information.


Pandaral·lel, because the title implies, is a approach to parallelize Pandas jobs throughout a number of machines. The draw back is that Pandaral·lel works solely with Pandas. But when Pandas is what you’re utilizing, and all you want is a approach to speed up Pandas jobs throughout a number of cores on a single laptop, Pandaral·lel is laser-focused on the duty.

Be aware that whereas Pandaral·lel does run on Home windows, it’ll run solely from Python classes launched within the Home windows Subsystem for Linux. Linux and macOS customers can run Pandaral·lel as-is. 


Ipyparallel is one other tightly centered multiprocessing and task-distribution system, particularly for parallelizing the execution of Jupyter pocket book code throughout a cluster. Initiatives and groups already working in Jupyter can begin utilizing Ipyparallel instantly.

Ipyparallel helps many approaches to parallelizing code. On the easy finish, there’s map, which applies any operate to a sequence and splits the work evenly throughout accessible nodes. For extra complicated work, you possibly can enhance particular capabilities to at all times run remotely or in parallel.

Jupyter notebooks assist “magic instructions” for actions which are solely potential in a pocket book atmosphere. Ipyparallel provides just a few magic instructions of its personal. For instance, you possibly can prefix any Python assertion with %px to routinely parallelize it.


Joblib has two main targets: run jobs in parallel and don’t recompute outcomes if nothing has modified. These efficiencies make Joblib well-suited for scientific computing, the place reproducible outcomes are sacrosanct. Joblib’s documentation gives loads of examples for learn how to use all its options.

Joblib syntax for parallelizing work is straightforward sufficient—it quantities to a decorator that can be utilized to separate jobs throughout processors, or to cache outcomes. Parallel jobs can use threads or processes.

Joblib features a clear disk cache for Python objects created by compute jobs. This cache not solely helps Joblib keep away from repeating work, as famous above, however will also be used to droop and resume long-running jobs, or decide up the place a job left off after a crash. The cache can be intelligently optimized for big objects like NumPy arrays. Areas of knowledge could be shared in-memory between processes on the identical system by utilizing numpy.memmap. This all makes Joblib extremely helpful for work that will take a very long time to finish, since you possibly can keep away from redoing current work and pause/resume as wanted.

One factor Joblib doesn’t supply is a approach to distribute jobs throughout a number of separate computer systems. In principle it’s potential to make use of Joblib’s pipeline to do that, however it’s in all probability simpler to make use of one other framework that helps it natively. 


Quick for “Parallel Scripting Library,” Parsl enables you to take computing jobs and cut up them throughout a number of techniques utilizing roughly the identical syntax as Python’s current Pool objects. It additionally enables you to sew collectively totally different computing duties into multi-step workflows, which might run in parallel, in sequence, or through map/scale back operations.

Parsl enables you to execute native Python functions, but in addition run another exterior utility by means of instructions to the shell. Your Python code is simply written like regular Python code, save for a particular operate decorator that marks the entry level to your work. The job-submission system additionally offers you fine-grained management over how issues run on the targets—for instance, the variety of cores per employee, how a lot reminiscence per employee, CPU affinity controls, how typically to ballot for timeouts, and so forth.

One wonderful characteristic Parsl provides is a set of prebuilt templates to dispatch work to quite a lot of high-end computing assets. This not solely contains staples like AWS or Kubernetes clusters, however supercomputing assets (assuming you have got entry) like Blue Waters, ASPIRE 1, Frontera, and so forth. (Parsl was co-developed with assistance from most of the establishments that constructed such {hardware}.)

Python’s limitations with threads will proceed to evolve, with main adjustments slated to permit threads to run side-by-side for CPU-bound work. However these updates are years away from being usable. Libraries designed for parallelism may help fill the hole whereas we wait. 

Copyright © 2023 IDG Communications, Inc.

Leave a Reply