Using Dask on Ibex
Dask from modulefile
On Ibex you can install dask
in your own directory or opt to use the pre-installed module. To use dask
from the pre-installed module, you can do the following:
module load dl
module load dask
How to Install your own
Users can either install in a conda
environment or via pip
in there own directories whether /home
or /ibex/scractch/$USER
.
Using conda
package
Here we assume that you have installed miniconda
in your /home
directory on Ibex. Installation instructions can be found on this GitHub page
While in you base environment create a new conda
environment:
conda create -n dask python=3.7
conda activate dask
Once ready, you can install via pip
installer:
pip install --no-cache-dir dask[complete] dask-mpi dask-jobqueue dask-ml jupyter notebook jupyter_server jupyter_tensorboard ipyparallel
This will bring the whole kitchen sinks. It includes dask-core
, dask-distributed
, dask-mpi
launcher, and dask-jobqueue
for high throughput task farm, amongst other things.
Using pip
to install
If you don’t use conda
package management and depend on the system installed python
modules, you can install via pip
.
The above should install all the dependencies in the prescribed install directory referenced by the --prefix
flag.
Once installed, you will need to maintain some environment variables to enable your runtime to find dask
and related executables.
This should allow you to find both dask
python modules and dask-mpi
etc.
In the above you can see that even though we are using python
from Ibex modulefile, we are loading dask
and related dependencies e.g. numpy
from INSTALL_DIR
.
For dask-mpi
which depends on mpi4py
we recommend installing it with openmpi/4.0.3
modulefile loaded in the environment from Ibex. It has all the middleware integration e.g. Mellanox drivers and UCX. The following step applies to both conda
and non-conda build. If using conda
environment, we assume you already have loaded the right environment:
Running Dask on Ibex
Dask can be run in a jupyter
notebook. The following is an example of how to start a dask-distributed
cluster and connect to it from a notebook.
For the rest of the example please refer to https://kaust-supercomputing-lab.atlassian.net/wiki/spaces/Doc/pages/271122487 example details.