Interactive computing using Jupyter Notebooks on KSL platforms

In this article, I explain how to launch Jupyter Notebooks on compute node of Shaheen and Ibex as a server and run the Jupyter client in the local browser on your laptop/workstation.

 

The steps are as follows:

  1. The Jupyter server instance will be submitted to the SLURM scheduler as a jobscript requesting resource allocation and launching Jupyter when the allocation is available for your job. Note that all the required modules should have been loaded in your jobscript before submitting.

  2. Reverse connect to the compute node(s) running the Jupyter server

  3. Launching the Jupyter client in your local browser

  4. Terminating/ending the session

Jupyter server Jobscript

Let’s first look at how to launch a Jupyter server on the Ibex GPU node and connect to it.

Ibex compute node

For example, the following is a jobscript requesting GPU resources on Ibex.

#!/bin/bash --login #SBATCH --time=00:30:00 #SBATCH --nodes=1 #SBATCH --gpus-per-node=v100:1 #SBATCH --cpus-per-gpu=6 #SBATCH --mem=32G #SBATCH --partition=batch #SBATCH --job-name=demo #SBATCH --mail-type=ALL #SBATCH --output=%x-%j-slurm.out #SBATCH --error=%x-%j-slurm.err # Load environment which has Jupyter installed. It can be one of the following: # - Machine Learning module installed on the system (module load machine_learning) # - your own conda environment on Ibex # - a singularity container with python environment (conda or otherwise) # setup the environment module purge # You can use the machine learning module module load machine_learning/2022.11 # or you can activate the conda environment directly by uncommenting the following lines #export ENV_PREFIX=$PWD/env #conda activate $ENV_PREFIX # setup ssh tunneling # get tunneling info export XDG_RUNTIME_DIR=/tmp node=$(hostname -s) user=$(whoami) submit_host=${SLURM_SUBMIT_HOST} port=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()') echo ${node} pinned to port ${port} on ${submit_host} # print tunneling instructions echo -e " ${node} pinned to port ${port} on ${submit_host} To connect to the compute node ${node} on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1. Command to create ssh tunnel from you workstation/laptop to glogin: ssh -L ${port}:${node}.ibex.kaust.edu.sa:${port} ${user}@glogin.ibex.kaust.edu.sa Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop. " >&2 # Run Jupyter #jupyter notebook --no-browser --port=${port} --port-retries=0 --ip=${node} # launch jupyter server jupyter ${1:-lab} --no-browser --port=${port} --port-retries=0 --ip=${node}.ibex.kaust.edu.sa

 

To submit the above jobscript (e.g jupyter_notebook.slurm) to the scheduler:

sbatch jupyter_notebook.slurm

Once the job starts, the SLURM output file created in the directory you submitted the job from will have the instructions on how to reverse connect.

The SLURM output will look something like this:

Loading module for CUDA 11.2.2 CUDA 11.2.2 is now loaded Loading module for CUDA 11.2.2 CUDA 11.2.2 is now loaded UCX 1.9.0 is now loaded Open MPI 4.0.3 is now loaded Loading module for cudnn cudnn is now loaded Loading module for CUDA 11.2.2 CUDA 11.2.2 is now loaded GNU 9.2.0 is now loaded Loading module for Machine Learning 2022.11 Machine Learning 2022.11 is now loaded gpu214-06 pinned to port 55479 on login510-27 To connect to the compute node gpu214-06 on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1. Command to create ssh tunnel from you workstation/laptop to glogin: ssh -L 55479:gpu214-06.ibex.kaust.edu.sa:55479 barradd@glogin.ibex.kaust.edu.sa Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop. [I 2023-01-31 08:38:56.289 ServerApp] dask_labextension | extension was successfully linked. [I 2023-01-31 08:38:56.298 ServerApp] jupyter_server_mathjax | extension was successfully linked. [I 2023-01-31 08:38:56.298 ServerApp] jupyter_server_proxy | extension was successfully linked. [I 2023-01-31 08:38:56.309 ServerApp] jupyterlab | extension was successfully linked. [I 2023-01-31 08:38:56.309 ServerApp] jupyterlab_git | extension was successfully linked. [I 2023-01-31 08:38:56.309 ServerApp] jupyterlab_nvdashboard | extension was successfully linked. [I 2023-01-31 08:38:56.320 ServerApp] nbclassic | extension was successfully linked. [I 2023-01-31 08:38:56.320 ServerApp] nbdime | extension was successfully linked. [I 2023-01-31 08:38:56.492 ServerApp] dask_labextension | extension was successfully loaded. [I 2023-01-31 08:38:56.493 ServerApp] jupyter_server_mathjax | extension was successfully loaded. [I 2023-01-31 08:38:57.035 ServerApp] jupyter_server_proxy | extension was successfully loaded. [I 2023-01-31 08:38:57.036 LabApp] JupyterLab extension loaded from /sw/csgv/machine_learning/2022.11/el7_cudnn8.2_cuda11.2_py3.8_env/machine_learning-module/env/lib/python3.9/site-packages/jupyterlab [I 2023-01-31 08:38:57.036 LabApp] JupyterLab application directory is /sw/csgv/machine_learning/2022.11/el7_cudnn8.2_cuda11.2_py3.8_env/machine_learning-module/env/share/jupyter/lab [I 2023-01-31 08:38:57.040 ServerApp] jupyterlab | extension was successfully loaded. [I 2023-01-31 08:38:57.044 ServerApp] jupyterlab_git | extension was successfully loaded. [W 2023-01-31 08:38:57.044 ServerApp] jupyterlab_nvdashboard | extension failed loading with message: 'NoneType' object is not callable [E 2023-01-31 08:38:57.044 ServerApp] jupyterlab_nvdashboard | stack trace [I 2023-01-31 08:38:57.059 ServerApp] nbclassic | extension was successfully loaded. [I 2023-01-31 08:38:57.215 ServerApp] nbdime | extension was successfully loaded. [I 2023-01-31 08:38:57.216 ServerApp] Serving notebooks from local directory: /ibex/scratch/barradd/Data-science-onboarding-2022 [I 2023-01-31 08:38:57.216 ServerApp] Jupyter Server 1.21.0 is running at: [I 2023-01-31 08:38:57.216 ServerApp] http://gpu214-06.ibex.kaust.edu.sa:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776 [I 2023-01-31 08:38:57.216 ServerApp] or http://127.0.0.1:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776 [I 2023-01-31 08:38:57.216 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 2023-01-31 08:38:57.225 ServerApp] To access the server, open this file in a browser: file:///home/barradd/.local/share/jupyter/runtime/jpserver-44653-open.html Or copy and paste one of these URLs: http://gpu214-06.ibex.kaust.edu.sa:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776 or http://127.0.0.1:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
  • Open a new terminal on your local machine and copy and paste the ssh tunnel command from the %x-%j-slurm.err

  • This has created an SSH tunnel between the compute node your Jupyter server is launched on Ibex and your local machine on IP address localhost and port 57162.

  • Now we are ready to launch our Jupyter client. Copy one of the two last lines in the %x-%j-slurm.err file and paste it into your browser address bar:

And when it connects, would look as follows:

  • Be aware that the root directory in your Jupyter file browser is the directory you submitted the job from.

  • We can now do some computations. Since this Jupyter job asked for, let’s test the GPU. Note that all the required modules should have been loaded in your jobscript before submitting.

Shaheen compute node

Running Jupyter on Shaheen compute nodes is not much different, except for one extra step. The jobs script is very similar

  • As in Ibex, once the job starts, an output file will be created in your submission directory with some instructions:

  • Open a new terminal and copy the first step from the SLURM output file, i.e. ssh with a tunnel to CDL login node of Shaheen:

  • As a next step, in the same terminal as the previous step, copy-paste the next line, which ssh into the gateway from the CDL login node you have just logged in:

  • We are now ready to connect to our Jupyter server. Examine the %x-%j-slurm.err file and copy the URL starting with https://127.0.0.1/…. Paste it into your local internet browser to open the Jupyter client.

SLURM output file:

 

Terminating running Interactive sessions

When you have finished computing, it is mandatory that you terminate the session properly, or else the SLURM job on the server side will keep running until it hits the wall time.

So to correctly close the session go the “File” and click “Shut Down“.

This will trigger the “Shutdown confirmation “ window, select “Shut Down“

 

When you reach the final window, you should see that both the server and the SLURM job have been stopped.