Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Current »

In this article, I explain how to launch Jupyter Notebooks on compute node of Shaheen and Ibex as a server and run the Jupyter client in the local browser on your laptop/workstation.

The steps are as follows:

  1. The Jupyter server instance will be submitted to the SLURM scheduler as a jobscript requesting resource allocation and launching Jupyter when the allocation is available for your job. Note that all the required modules should have been loaded in your jobscript before submitting.

  2. Reverse connect to the compute node(s) running the Jupyter server

  3. Launching the Jupyter client in your local browser

  4. Terminating/ending the session

Jupyter server Jobscript

Let’s first look at how to launch a Jupyter server on the Ibex GPU node and connect to it.

Ibex compute node

For example, the following is a jobscript requesting GPU resources on Ibex.

#!/bin/bash --login
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --gpus-per-node=v100:1
#SBATCH --cpus-per-gpu=6  
#SBATCH --mem=32G
#SBATCH --partition=batch 
#SBATCH --job-name=demo
#SBATCH --mail-type=ALL
#SBATCH --output=%x-%j-slurm.out
#SBATCH --error=%x-%j-slurm.err 
 
# use srun to launch Jupyter server in order to reserve a port

srun launch-jupyter-server.srun

This is the launch-jupyter-server.srun script:

# Load environment which has Jupyter installed. It can be one of the following:
# - Machine Learning module installed on the system (module load machine_learning)
# - your own conda environment on Ibex
# - a singularity container with python environment (conda or otherwise)  

# setup the environment
module purge

# You can use the machine learning module 
module load machine_learning/2022.11
# or you can activate the conda environment directly by uncommenting the following lines
#export ENV_PREFIX=$PWD/env
#conda activate $ENV_PREFIX

# setup ssh tunneling
# get tunneling info 
export XDG_RUNTIME_DIR=/tmp node=$(hostname -s) 
user=$(whoami) 
submit_host=${SLURM_SUBMIT_HOST} 
port=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
echo ${node} pinned to port ${port} on ${submit_host} 

# print tunneling instructions  
echo -e " 
${node} pinned to port ${port} on ${submit_host} 
To connect to the compute node ${node} on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1. 
Command to create ssh tunnel from you workstation/laptop to glogin: 
 
ssh -L ${port}:${node}.ibex.kaust.edu.sa:${port} ${user}@glogin.ibex.kaust.edu.sa 
 
Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop.
" >&2 
 
# Run Jupyter 
#jupyter notebook --no-browser --port=${port} --port-retries=0 --ip=${node}

# launch jupyter server
jupyter ${1:-lab} --no-browser --port=${port} --port-retries=0  --ip=${node}.ibex.kaust.edu.sa

You can have both blocks of code on the same script file. On the Shaheen section we show such example.

To submit the above jobscript (e.g jupyter_notebook.slurm) to the scheduler:

sbatch jupyter_notebook.slurm

Once the job starts, the SLURM output file created in the directory you submitted the job from will have the instructions on how to reverse connect.

The SLURM output will look something like this:

Loading module for CUDA 11.2.2
CUDA 11.2.2 is now loaded
Loading module for CUDA 11.2.2
CUDA 11.2.2 is now loaded
UCX 1.9.0 is now loaded
Open MPI 4.0.3 is now loaded
Loading module for cudnn
cudnn is now loaded
Loading module for CUDA 11.2.2
CUDA 11.2.2 is now loaded
GNU 9.2.0 is now loaded
Loading module for Machine Learning 2022.11
Machine Learning 2022.11 is now loaded
 
gpu214-06 pinned to port 55479 on login510-27 
To connect to the compute node gpu214-06 on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1. 
Command to create ssh tunnel from you workstation/laptop to glogin: 
 
ssh -L 55479:gpu214-06.ibex.kaust.edu.sa:55479 barradd@glogin.ibex.kaust.edu.sa 
 
Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop.

[I 2023-01-31 08:38:56.289 ServerApp] dask_labextension | extension was successfully linked.
[I 2023-01-31 08:38:56.298 ServerApp] jupyter_server_mathjax | extension was successfully linked.
[I 2023-01-31 08:38:56.298 ServerApp] jupyter_server_proxy | extension was successfully linked.
[I 2023-01-31 08:38:56.309 ServerApp] jupyterlab | extension was successfully linked.
[I 2023-01-31 08:38:56.309 ServerApp] jupyterlab_git | extension was successfully linked.
[I 2023-01-31 08:38:56.309 ServerApp] jupyterlab_nvdashboard | extension was successfully linked.
[I 2023-01-31 08:38:56.320 ServerApp] nbclassic | extension was successfully linked.
[I 2023-01-31 08:38:56.320 ServerApp] nbdime | extension was successfully linked.
[I 2023-01-31 08:38:56.492 ServerApp] dask_labextension | extension was successfully loaded.
[I 2023-01-31 08:38:56.493 ServerApp] jupyter_server_mathjax | extension was successfully loaded.
[I 2023-01-31 08:38:57.035 ServerApp] jupyter_server_proxy | extension was successfully loaded.
[I 2023-01-31 08:38:57.036 LabApp] JupyterLab extension loaded from /sw/csgv/machine_learning/2022.11/el7_cudnn8.2_cuda11.2_py3.8_env/machine_learning-module/env/lib/python3.9/site-packages/jupyterlab
[I 2023-01-31 08:38:57.036 LabApp] JupyterLab application directory is /sw/csgv/machine_learning/2022.11/el7_cudnn8.2_cuda11.2_py3.8_env/machine_learning-module/env/share/jupyter/lab
[I 2023-01-31 08:38:57.040 ServerApp] jupyterlab | extension was successfully loaded.
[I 2023-01-31 08:38:57.044 ServerApp] jupyterlab_git | extension was successfully loaded.
[W 2023-01-31 08:38:57.044 ServerApp] jupyterlab_nvdashboard | extension failed loading with message: 'NoneType' object is not callable
[E 2023-01-31 08:38:57.044 ServerApp] jupyterlab_nvdashboard | stack trace
[I 2023-01-31 08:38:57.059 ServerApp] nbclassic | extension was successfully loaded.
[I 2023-01-31 08:38:57.215 ServerApp] nbdime | extension was successfully loaded.
[I 2023-01-31 08:38:57.216 ServerApp] Serving notebooks from local directory: /ibex/scratch/barradd/Data-science-onboarding-2022
[I 2023-01-31 08:38:57.216 ServerApp] Jupyter Server 1.21.0 is running at:
[I 2023-01-31 08:38:57.216 ServerApp] http://gpu214-06.ibex.kaust.edu.sa:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
[I 2023-01-31 08:38:57.216 ServerApp]  or http://127.0.0.1:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
[I 2023-01-31 08:38:57.216 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2023-01-31 08:38:57.225 ServerApp] 
    
    To access the server, open this file in a browser:
        file:///home/barradd/.local/share/jupyter/runtime/jpserver-44653-open.html
    Or copy and paste one of these URLs:
        http://gpu214-06.ibex.kaust.edu.sa:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
     or http://127.0.0.1:55479/lab?token=8a998b0772313ce6e5cca9aca1f13f2faff18d950d78c776
  • Open a new terminal on your local machine and copy and paste the ssh tunnel command from the %x-%j-slurm.err

ssh -L 57162:gpu214-02.ibex.kaust.edu.sa:57162 your-username@glogin.ibex.kaust.edu.sa
  • This has created an SSH tunnel between the compute node your Jupyter server is launched on Ibex and your local machine on IP address localhost and port 57162.

  • Now we are ready to launch our Jupyter client. Copy one of the two last lines in the %x-%j-slurm.err file and paste it into your browser address bar:

http://gpu214-02.ibex.kaust.edu.sa:57162/lab?token=ce300e312eb05df3616f8d4329677635750da4818b26da70

And when it connects, would look as follows:

  • Be aware that the root directory in your Jupyter file browser is the directory you submitted the job from.

  • We can now do some computations. Since this Jupyter job asked for, let’s test the GPU. Note that all the required modules should have been loaded in your jobscript before submitting.

Shaheen compute node

Running Jupyter on Shaheen compute nodes is not much different, except for one extra step. The jobs script is very similar

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=workq
#SBATCH --time=00:30:00 
#SBATCH -A k1033
#SBATCH --job-name=demo
#SBATCH --mail-type=ALL
#SBATCH --output=%x-%j-slurm.out
#SBATCH --error=%x-%j-slurm.err 

export LC_ALL=C.UTF-8
export LANG=C.UTF-8

### Load the modules you need for your job
### and swap the PrgEnv module if needed
module swap PrgEnv-cray PrgEnv-intel
module load intelpython3/2022.0.2.155 pytorch/1.8.0

############################################################
## Load the conda base and activate the conda environment ##
############################################################
############################################################
## export the path to the conda base and conda_cache
############################################################
# export ENV_PREFIX=/project/k1033/barradd/install_miniconda_shaheen
# export CONDA_PKGS_DIRS=$PWD/conda_cache
############################################################ 
## activate conda base from the command line
############################################################
#source $ENV_PREFIX/miniconda3/bin/activate $ENV_PREFIX/env
## Or be very explicit and use the full path to the activate script
#source /project/k1033/barradd/install_miniconda_shaheen/miniconda3/bin/activate /project/k1033/barradd/install_miniconda_shaheen/env


# setup ssh tunneling
# get tunneling info 
export XDG_RUNTIME_DIR=/tmp node=$(hostname -s) 
user=$(whoami) 
submit_host=${SLURM_SUBMIT_HOST} 
gateway=${EPROXY_LOGIN}
port=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
echo ${node} pinned to port ${port} on ${gateway} 

# print tunneling instructions jupyter-log
echo -e "
To connect to the compute node ${node} on Shaheen running your jupyter notebook server,
you need to run following two commands in a terminal
1. Command to create ssh tunnel from you workstation/laptop to cdlX:
ssh -L ${port}:localhost:${port} ${user}@${submit_host}.hpc.kaust.edu.sa
2. Command to create ssh tunnel to run on cdlX:
ssh -L ${port}:${node}:${port} ${user}@${gateway}

Copy the link provided below by jupyter-server and replace the nid0XXXX with localhost before pasting it in your browser on your workstation/laptop. Do not forget to close the notebooks you open in you browser and shutdown the jupyter client in your browser for gracefully exiting this job or else you will have to mannually cancel this job running your jupyter server.
"

echo "Starting jupyter server in background with requested resouce"

# Run Jupyter
# jupyter lab --no-browser --port=${port} --ip=${node} 
jupyter ${1:-lab} --no-browser --port=${port} --port-retries=0  --ip=${node}

  • As in Ibex, once the job starts, an output file will be created in your submission directory with some instructions:

nid00141 pinned to port 58833 on gateway1

To connect to the compute node nid00141 on Shaheen running your jupyter notebook server,
you need to run following two commands in a terminal
1. Command to create ssh tunnel from you workstation/laptop to cdlX:
ssh -L 58833:localhost:58833 barradd@cdl3.hpc.kaust.edu.sa
2. Command to create ssh tunnel to run on cdlX:
ssh -L 58833:nid00141:58833 barradd@gateway1

Copy the link provided below by jupyter-server and replace the nid0XXXX with localhost before pasting it in your browser on your workstation/laptop. Do not forget to close the notebooks you open in you browser and shutdown the jupyter client in your browser for gracefully exiting this job or else you will have to mannually cancel this job running your jupyter server.

Starting jupyter server in background with requested resouce
  • Open a new terminal and copy the first step from the SLURM output file, i.e. ssh with a tunnel to CDL login node of Shaheen:

$ ssh -L 58833:localhost:58833 barradd@cdl3.hpc.kaust.edu.sa
Use of this system is limited to users who have been properly authorised by
the KAUST Supercomputing Laboratory. Unauthorised users must disconnect
immediately.

For support, see http://www.hpc.kaust.edu.sa/
or email help@hpc.kaust.edu.sa
barradd@cdl3.hpc.kaust.edu.sa's password: 
(barradd@cdl3.hpc.kaust.edu.sa) One-time password (OATH) for `barradd': 
Last failed login: Tue Jan 31 10:10:35 +03 2023 from kl-23568.kaust.edu.sa on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Tue Jan 31 09:25:54 2023 from kl-23568.kaust.edu.sa
               ____  _           _
              / ___|| |__   __ _| |__   ___  ___ _ __
              \___ \| '_ \ / _` | '_ \ / _ \/ _ \ '_ \
               ___) | | | | (_| | | | |  __/  __/ | | |
              |____/|_| |_|\__,_|_| |_|\___|\___|_| |_|

Shaheen is a 36 rack Cray XC40 system. The front-end environment is
running SUSE Linux Enterprise Server 15.
  • As a next step, in the same terminal as the previous step, copy-paste the next line, which ssh into the gateway from the CDL login node you have just logged in:

$ ssh -L 58833:nid00141:58833 barradd@gateway1
Last login: Tue Jan 31 09:26:18 2023 from cdl3.hpc.kaust.edu.sa

*** Welcome to IMPS service node c6-0c1s1n1 (nid 1221) ***
    Running 1.2GB Suse 15.2 image shaheen_login-large 
    CLE release 7.0.UP03, build 7.0.3415(20220906)
    16 vcores, boot_freemem: 28466mb

barradd@gateway1:~> 
  • We are now ready to connect to our Jupyter server. Examine the %x-%j-slurm.err file and copy the URL starting with https://127.0.0.1/…. Paste it into your local internet browser to open the Jupyter client.

cat demo-30201490-slurm.err
[I 2023-01-31 10:10:50.692 ServerApp] ipyparallel | extension was successfully linked.
[I 2023-01-31 10:10:50.694 ServerApp] jupyter_server_proxy | extension was successfully linked.
[I 2023-01-31 10:10:50.717 ServerApp] jupyterlab | extension was successfully linked.
[I 2023-01-31 10:10:50.717 ServerApp] jupyterlab_nvdashboard | extension was successfully linked.
[I 2023-01-31 10:10:50.741 ServerApp] nbclassic | extension was successfully linked.
[I 2023-01-31 10:10:50.895 ServerApp] notebook_shim | extension was successfully linked.
[I 2023-01-31 10:10:51.799 ServerApp] notebook_shim | extension was successfully loaded.
[I 2023-01-31 10:10:51.948 ServerApp] Loading IPython parallel extension
[I 2023-01-31 10:10:51.949 ServerApp] ipyparallel | extension was successfully loaded.
[I 2023-01-31 10:10:56.200 ServerApp] jupyter_server_proxy | extension was successfully loaded.
[I 2023-01-31 10:10:56.222 LabApp] JupyterLab extension loaded from /sw/xc40cle7up03/intelpython3/2022_0_2_155/sles15_gcc7.5.0/intelpython/latest/lib/python3.9/site-packages/jupyterlab
[I 2023-01-31 10:10:56.222 LabApp] JupyterLab application directory is /sw/xc40cle7up03/intelpython3/2022_0_2_155/sles15_gcc7.5.0/intelpython/python3.9/share/jupyter/lab
[I 2023-01-31 10:10:56.293 ServerApp] jupyterlab | extension was successfully loaded.
[W 2023-01-31 10:10:56.293 ServerApp] jupyterlab_nvdashboard | extension failed loading with message: 'NoneType' object is not callable
[E 2023-01-31 10:10:56.293 ServerApp] jupyterlab_nvdashboard | stack trace
[I 2023-01-31 10:10:56.574 ServerApp] nbclassic | extension was successfully loaded.
[I 2023-01-31 10:10:56.576 ServerApp] Serving notebooks from local directory: /lustre/scratch/barradd/shaheen_miniconda_intall/script
[I 2023-01-31 10:10:56.576 ServerApp] Jupyter Server 1.19.1 is running at:
[I 2023-01-31 10:10:56.576 ServerApp] http://nid00141:58833/lab?token=289c6006ad9547eba18db9926e35657e141d887546deedf3
[I 2023-01-31 10:10:56.576 ServerApp]  or http://127.0.0.1:58833/lab?token=289c6006ad9547eba18db9926e35657e141d887546deedf3
[I 2023-01-31 10:10:56.576 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2023-01-31 10:10:56.613 ServerApp] 
    
    To access the server, open this file in a browser:
        file:///lustre/scratch/barradd/.local/share/jupyter/runtime/jpserver-47285-open.html
    Or copy and paste one of these URLs:
        http://nid00141:58833/lab?token=289c6006ad9547eba18db9926e35657e141d887546deedf3
     or http://127.0.0.1:58833/lab?token=289c6006ad9547eba18db9926e35657e141d887546deedf3

SLURM output file:

http://127.0.0.1:58833/lab?token=289c6006ad9547eba18db9926e35657e141d887546deedf3

Terminating running Interactive sessions

When you have finished computing, it is mandatory that you terminate the session properly, or else the SLURM job on the server side will keep running until it hits the wall time.

So to correctly close the session go the “File” and click “Shut Down“.

This will trigger the “Shutdown confirmation “ window, select “Shut Down“

When you reach the final window, you should see that both the server and the SLURM job have been stopped.

  • No labels