Interactive computing using Jupyter Notebooks on KSL platforms
In this article, I explain how to launch Jupyter Notebooks on compute node of Shaheen and Ibex as a server and run the Jupyter client in the local browser on your laptop/workstation.
The steps are as follows:
The Jupyter server instance will be submitted to the SLURM scheduler as a jobscript requesting resource allocation and launching Jupyter when the allocation is available for your job. Note that all the required modules should have been loaded in your jobscript before submitting.
Reverse connect to the compute node(s) running the Jupyter server
Launching the Jupyter client in your local browser
Terminating/ending the session
Jupyter server Jobscript
Let’s first look at how to launch a Jupyter server on the Ibex GPU node and connect to it.
Ibex compute node
For example, the following is a jobscript requesting GPU resources on Ibex.
#!/bin/bash --login
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --gpus-per-node=v100:1
#SBATCH --cpus-per-gpu=6
#SBATCH --mem=32G
#SBATCH --partition=batch
#SBATCH --job-name=demo
#SBATCH --mail-type=ALL
#SBATCH --output=%x-%j-slurm.out
#SBATCH --error=%x-%j-slurm.err
# Load environment which has Jupyter installed. It can be one of the following:
# - Machine Learning module installed on the system (module load machine_learning)
# - your own conda environment on Ibex
# - a singularity container with python environment (conda or otherwise)
# setup the environment
module purge
# You can use the machine learning module
module load machine_learning/2022.11
# or you can activate the conda environment directly by uncommenting the following lines
#export ENV_PREFIX=$PWD/env
#conda activate $ENV_PREFIX
# setup ssh tunneling
# get tunneling info
export XDG_RUNTIME_DIR=/tmp node=$(hostname -s)
user=$(whoami)
submit_host=${SLURM_SUBMIT_HOST}
port=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
echo ${node} pinned to port ${port} on ${submit_host}
# print tunneling instructions
echo -e "
${node} pinned to port ${port} on ${submit_host}
To connect to the compute node ${node} on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1.
Command to create ssh tunnel from you workstation/laptop to glogin:
ssh -L ${port}:${node}.ibex.kaust.edu.sa:${port} ${user}@glogin.ibex.kaust.edu.sa
Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop.
" >&2
# Run Jupyter
#jupyter notebook --no-browser --port=${port} --port-retries=0 --ip=${node}
# launch jupyter server
jupyter ${1:-lab} --no-browser --port=${port} --port-retries=0 --ip=${node}.ibex.kaust.edu.sa
To submit the above jobscript (e.g jupyter_notebook.slurm
) to the scheduler:
sbatch jupyter_notebook.slurm
Once the job starts, the SLURM output file created in the directory you submitted the job from will have the instructions on how to reverse connect.
The SLURM output will look something like this:
Loading module for CUDA 11.2.2
CUDA 11.2.2 is now loaded
Loading module for CUDA 11.2.2
CUDA 11.2.2 is now loaded
UCX 1.9.0 is now loaded
Open MPI 4.0.3 is now loaded
Loading module for cudnn
cudnn is now loaded
Loading module for CUDA 11.2.2
CUDA 11.2.2 is now loaded
GNU 9.2.0 is now loaded
Loading module for Machine Learning 2022.11
Machine Learning 2022.11 is now loaded
gpu214-06 pinned to port 55479 on login510-27
To connect to the compute node gpu214-06 on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1.
Command to create ssh tunnel from you workstation/laptop to glogin:
ssh -L 55479:gpu214-06.ibex.kaust.edu.sa:55479 barradd@glogin.ibex.kaust.edu.sa
Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop.
[I 2023-01-31 08:38:56.289 ServerApp] dask_labextension | extension was successfully linked.
[I 2023-01-31 08:38:56.298 ServerApp] jupyter_server_mathjax | extension was successfully linked.
[I 2023-01-31 08:38:56.298 ServerApp] jupyter_server_proxy | extension was successfully linked.
[I 2023-01-31 08:38:56.309 ServerApp] jupyterlab | extension was successfully linked.
[I 2023-01-31 08:38:56.309 ServerApp] jupyterlab_git | extension was successfully linked.
[I 2023-01-31 08:38:56.309 ServerApp] jupyterlab_nvdashboard | extension was successfully linked.
[I 2023-01-31 08:38:56.320 ServerApp] nbclassic | extension was successfully linked.
[I 2023-01-31 08:38:56.320 ServerApp] nbdime | extension was successfully linked.
[I 2023-01-31 08:38:56.492 ServerApp] dask_labextension | extension was successfully loaded.
[I 2023-01-31 08:38:56.493 ServerApp] jupyter_server_mathjax | extension was successfully loaded.
[I 2023-01-31 08:38:57.035 ServerApp] jupyter_server_proxy | extension was successfully loaded.
[I 2023-01-31 08:38:57.036 LabApp] JupyterLab extension loaded from /sw/csgv/machine_learning/2022.11/el7_cudnn8.2_cuda11.2_py3.8_env/machine_learning-module/env/lib/python3.9/site-packages/jupyterlab
[I 2023-01-31 08:38:57.036 LabApp] JupyterLab application directory is /sw/csgv/machine_learning/2022.11/el7_cudnn8.2_cuda11.2_py3.8_env/machine_learning-module/env/share/jupyter/lab
[I 2023-01-31 08:38:57.040 ServerApp] jupyterlab | extension was successfully loaded.
[I 2023-01-31 08:38:57.044 ServerApp] jupyterlab_git | extension was successfully loaded.
[W 2023-01-31 08:38:57.044 ServerApp] jupyterlab_nvdashboard | extension failed loading with message: 'NoneType' object is not callable
[E 2023-01-31 08:38:57.044 ServerApp] jupyterlab_nvdashboard | stack trace
[I 2023-01-31 08:38:57.059 ServerApp] nbclassic | extension was successfully loaded.
[I 2023-01-31 08:38:57.215 ServerApp] nbdime | extension was successfully loaded.
[I 2023-01-31 08:38:57.216 ServerApp] Serving notebooks from local directory: /ibex/scratch/barradd/Data-science-onboarding-2022
[I 2023-01-31 08:38:57.216 ServerApp] Jupyter Server 1.21.0 is running at:
[I 2023-01-31 08:38:57.216 ServerApp] http://gpu214-06.ibex.kaust.edu.sa:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
[I 2023-01-31 08:38:57.216 ServerApp] or http://127.0.0.1:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
[I 2023-01-31 08:38:57.216 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2023-01-31 08:38:57.225 ServerApp]
To access the server, open this file in a browser:
file:///home/barradd/.local/share/jupyter/runtime/jpserver-44653-open.html
Or copy and paste one of these URLs:
http://gpu214-06.ibex.kaust.edu.sa:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
or http://127.0.0.1:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
Open a new terminal on your local machine and copy and paste the ssh tunnel command from the
%x-%j-slurm.err
This has created an SSH tunnel between the compute node your Jupyter server is launched on Ibex and your local machine on IP address localhost and port 57162.
Now we are ready to launch our Jupyter client. Copy one of the two last lines in the
%x-%j-slurm.err
file and paste it into your browser address bar:
And when it connects, would look as follows:
Be aware that the root directory in your Jupyter file browser is the directory you submitted the job from.
We can now do some computations. Since this Jupyter job asked for, let’s test the GPU. Note that all the required modules should have been loaded in your jobscript before submitting.
Shaheen compute node
Running Jupyter on Shaheen compute nodes is not much different, except for one extra step. The jobs script is very similar
As in Ibex, once the job starts, an output file will be created in your submission directory with some instructions:
Open a new terminal and copy the first step from the SLURM output file, i.e. ssh with a tunnel to CDL login node of Shaheen:
As a next step, in the same terminal as the previous step, copy-paste the next line, which ssh into the gateway from the CDL login node you have just logged in:
We are now ready to connect to our Jupyter server. Examine the
%x-%j-slurm.err
file and copy the URL starting withhttps://127.0.0.1/….
Paste it into your local internet browser to open the Jupyter client.
SLURM output file:
Terminating running Interactive sessions
When you have finished computing, it is mandatory that you terminate the session properly, or else the SLURM job on the server side will keep running until it hits the wall time.
So to correctly close the session go the “File” and click “Shut Down“.
This will trigger the “Shutdown confirmation “ window, select “Shut Down“
When you reach the final window, you should see that both the server and the SLURM job have been stopped.