Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  1. Dynamic task scheduler to optimize computation as farm of tasks. Once tasks are formed Dask can pipeline them as a workflow engine/manager and schedule them on available resources

  2. Data structure representations called collections which are conducive to larger than memory computation when dealing with Big Data. When working with these data structures, a Python developer trades in familiar territories as working with Numpy, Pandas or Python futures etc. These collections leverage from the task scheduler to run.

Dask on Shaheen

The following launches a dask-mpi backend and a Jupyter notebook where the dask client can be called to connect to this backend scheduler. The example below show a multicore and multi-node job with each dask worker starting with multiple threads. We also publish dask-dashboard on a separate port to visualize.

Note that this example demonstrates the one would scale out dask to obtain more cores and memory than on one node. Please downsize or upsize the number of node appropriate to your case.


The following jobscript launch Dask with 8 worker ( i.e. 2 workers per node) and each worker with 16 threads


Code Block

Connecting Dask client to scheduler backend

The following notebook demonstrates connecting to the scheduler backend: