Using Dask on Ibex

 

Dask from modulefile

On Ibex you can install dask in your own directory or opt to use the pre-installed module. To use dask from the pre-installed module, you can do the following:

module load dl module load dask

How to Install your own

Users can either install in a conda environment or via pip in there own directories whether /home or /ibex/scractch/$USER.

Using conda package

Here we assume that you have installed miniconda in your /home directory on Ibex. Installation instructions can be found on this GitHub page

While in you base environment create a new conda environment:

conda create -n dask python=3.7 conda activate dask

Once ready, you can install via pip installer:

pip install --no-cache-dir dask[complete] dask-mpi dask-jobqueue dask-ml jupyter notebook jupyter_server jupyter_tensorboard ipyparallel

This will bring the whole kitchen sinks. It includes dask-core, dask-distributed, dask-mpi launcher, and dask-jobqueue for high throughput task farm, amongst other things.

Using pip to install

If you don’t use conda package management and depend on the system installed python modules, you can install via pip.

module load python/3.7.0 export INSTALL_DIR=/ibex/scratch/shaima0d/dask pip install --no-cache-dir --ignore-installed \ --prefix=${INSTALL_DIR} dask[complete] dask-mpi dask-jobqueue dask-ml jupyter jupyter_server jupyter_tensorboard ipyparallel

The above should install all the dependencies in the prescribed install directory referenced by the --prefix flag.

Once installed, you will need to maintain some environment variables to enable your runtime to find dask and related executables.

export PYTHONPATH=${INSTALL_DIR}/lib/python3.7/site-packages:$PYTHONPATH export PATH=${INSTALL_DIR}/bin:$PATH

 

This should allow you to find both dask python modules and dask-mpi etc.

-bash-4.2$ python Python 3.7.0 (default, Oct 29 2019, 12:43:29) [GCC 6.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import dask >>> dask.__version__ '2021.09.1' >>> import numpy as np >>> np.__version__ '1.21.2' >>> np.__file__ '/ibex/scratch/shaima0d/dask/lib/python3.7/site-packages/numpy/__init__.py'

In the above you can see that even though we are using python from Ibex modulefile, we are loading dask and related dependencies e.g. numpy from INSTALL_DIR.

 

For dask-mpi which depends on mpi4py we recommend installing it with openmpi/4.0.3 modulefile loaded in the environment from Ibex. It has all the middleware integration e.g. Mellanox drivers and UCX. The following step applies to both conda and non-conda build. If using conda environment, we assume you already have loaded the right environment:

module load openmpi/4.0.3 env MPICC=$(which mpicc) mpi4py==3.0.1

 

 

Running Dask on Ibex

Dask can be run in a jupyter notebook. The following is an example of how to start a dask-distributed cluster and connect to it from a notebook.

#!/bin/bash -l #SBATCH --ntasks=3 #SBATCH --tasks-per-node=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=20G #SBATCH --time=01:00:00 #SBATCH --account=ibex-cs module load dl module load gcc/11.1.0 module load dask module list mkdir workers${SLURM_JOBID} # get tunneling info export XDG_RUNTIME_DIR="" node=$(hostname -I | cut -d ' ' -f 2) user=$(whoami) submit_host=${SLURM_SUBMIT_HOST} sched_port=10021 jupyter_port=10022 dask_dashboard=10023 srun -n ${SLURM_NTASKS} -c ${SLURM_CPUS_PER_TASK} dask-mpi --worker-class distributed.Worker --local-directory=workers${SLURM_JOBID} --interface=ib0 --nthreads=${SLURM_CPUS_PER_TASK} --scheduler-port=${sched_port} \ --scheduler-file=scheduler_${SLURM_JOBID}.json --dashboard-address=${node}:${dask_dashboard} & sleep 20 echo $node pinned to port $port # print tunneling instructions jupyter-log echo -e " To connect to the compute node ${node} on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1. Command to create ssh tunnel from you workstation/laptop to cs-login: ssh -L ${jupyter_port}:${node}:${jupyter_port} -L ${dask_dashboard}:${node}:${dask_dashboard} ${user}@${submit_host}.ibex.kaust.edu.sa Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop use localhost:${dask_dashboard} to view dask dashboard " jupyter lab --no-browser --port=${jupyter_port} --ip=${node}

For the rest of the example please refer to https://kaust-supercomputing-lab.atlassian.net/wiki/spaces/Doc/pages/271122487 example details.