Using Dask on Ibex

 

Dask from modulefile

On Ibex you can install dask in your own directory or opt to use the pre-installed module. To use dask from the pre-installed module, you can do the following:

module load dl module load dask

How to Install your own

Users can either install in a conda environment or via pip in there own directories whether /home or /ibex/scractch/$USER.

Using conda package

Here we assume that you have installed miniconda in your /home directory on Ibex. Installation instructions can be found on this GitHub page

While in you base environment create a new conda environment:

conda create -n dask python=3.7 conda activate dask

Once ready, you can install via pip installer:

pip install --no-cache-dir dask[complete] dask-mpi dask-jobqueue dask-ml jupyter notebook jupyter_server jupyter_tensorboard ipyparallel

This will bring the whole kitchen sinks. It includes dask-core, dask-distributed, dask-mpi launcher, and dask-jobqueue for high throughput task farm, amongst other things.

Using pip to install

If you don’t use conda package management and depend on the system installed python modules, you can install via pip.

The above should install all the dependencies in the prescribed install directory referenced by the --prefix flag.

Once installed, you will need to maintain some environment variables to enable your runtime to find dask and related executables.

 

This should allow you to find both dask python modules and dask-mpi etc.

In the above you can see that even though we are using python from Ibex modulefile, we are loading dask and related dependencies e.g. numpy from INSTALL_DIR.

 

For dask-mpi which depends on mpi4py we recommend installing it with openmpi/4.0.3 modulefile loaded in the environment from Ibex. It has all the middleware integration e.g. Mellanox drivers and UCX. The following step applies to both conda and non-conda build. If using conda environment, we assume you already have loaded the right environment:

 

 

Running Dask on Ibex

Dask can be run in a jupyter notebook. The following is an example of how to start a dask-distributed cluster and connect to it from a notebook.

For the rest of the example please refer to https://kaust-supercomputing-lab.atlassian.net/wiki/spaces/Doc/pages/271122487 example details.