Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Dynamic task scheduler to optimize computation as farm of tasks. Once tasks are formed Dask can pipeline them as a workflow engine/manager and schedule them on available resources

  2. Data structure representations called collections which are conducive to larger than memory computation when dealing with Big Data. When working with these data structures, a Python developer trades in familiar territories as working with Numpy, Pandas or Python futures etc. These collections leverage from the task scheduler to run.

Table of Contents

Dask on Shaheen

The following launches a dask-mpi backend and a Jupyter notebook where the dask client can be called to connect to this backend scheduler. The example below show a multicore and multi-node job with each dask worker starting with multiple threads. We also publish dask-dashboard on a separate port to visualize.

...