R is a powerful language and is popular among bioinformaticians, statisticians and data scientists.
On Shaheen we recommend using Cray’s R installation for intercalatively using via R’s CLI or preferably batch processing of R scripts.
R interactive shell on compute node
To run R interactively on a Shaheen compute node, first request a compute node with 32 dedicated threads. This will also give you access to ~126GB of memory:
salloc -t 01:00:00 --hint=nomultithread -n 1 -c 32
Once allocated, you will land on the gateway node which has capability to launch jobs on the allocated compute node. For interactive session of R, we will need to access the compute node itself, therefore, the following step is needed:
srun -c 32 --hint=nomultithread --pty bash
You can now load the module and run R:
module load cray-R
> R R version 4.1.1 (2021-08-10) -- "Kick Things" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-suse-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > installed.packages() Package LibPath Version KernSmooth "KernSmooth" "/opt/R/4.1.1.0/lib64/R/library" "2.23-20" MASS "MASS" "/opt/R/4.1.1.0/lib64/R/library" "7.3-54" Matrix "Matrix" "/opt/R/4.1.1.0/lib64/R/library" "1.3-4" base "base" "/opt/R/4.1.1.0/lib64/R/library" "4.1.1" boot "boot" "/opt/R/4.1.1.0/lib64/R/library" "1.3-28" class "class" "/opt/R/4.1.1.0/lib64/R/library" "7.3-19" cluster "cluster" "/opt/R/4.1.1.0/lib64/R/library" "2.1.2" codetools "codetools" "/opt/R/4.1.1.0/lib64/R/library" "0.2-18" compiler "compiler" "/opt/R/4.1.1.0/lib64/R/library" "4.1.1" datasets "datasets" "/opt/R/4.1.1.0/lib64/R/library" "4.1.1" foreign "foreign" "/opt/R/4.1.1.0/lib64/R/library" "0.8-81" grDevices "grDevices" "/opt/R/4.1.1.0/lib64/R/library" "4.1.1" graphics "graphics" "/opt/R/4.1.1.0/lib64/R/library" "4.1.1" grid "grid" "/opt/R/4.1.1.0/lib64/R/library" "4.1.1" lattice "lattice" "/opt/R/4.1.1.0/lib64/R/library" "0.20-44" ....... graphics NA NA "yes" "4.1.1" grid NA NA "yes" "4.1.1" lattice NA NA "yes" "4.1.1" methods NA NA "yes" "4.1.1" mgcv NA NA "yes" "4.1.1" nlme NA NA "yes" "4.1.1" nnet NA NA "yes" "4.1.1" parallel NA NA "yes" "4.1.1" rpart NA NA "yes" "4.1.1" spatial NA NA "yes" "4.1.1" splines NA NA "yes" "4.1.1" stats NA NA "yes" "4.1.1" stats4 NA NA NA "4.1.1" survival NA NA "yes" "4.1.1" tcltk NA NA "yes" "4.1.1" tools NA NA "yes" "4.1.1" utils NA NA "yes" "4.1.1"
Installing new packages
For installing new packages, we recommend you install them in your own scratch
directory. For this you will need to set an environment variable R_LIBS=/scratch/$USER/rlibs
on any other directory in scratch
export R_LIBS=/scratch/$USER/rlibs
Then start the interactive session:
> R R version 4.1.1 (2021-08-10) -- "Kick Things" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-suse-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
And install the package:
> install.packages('readxl')
Here, as an example, I install readxl
package. Please note that this will ultimately be installed in the directory pointed to by R_LIBS
variable set earlier. If you don’t set the variable, R will try to install it in the root directory of R and it will fail due to permissions issue.
For using the installed package, please set the R_LIBS
variable before you call the R package:
export R_LIBS=/scratch/$USER/rlibs > R R version 4.1.1 (2021-08-10) -- "Kick Things" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-suse-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(readxl) > readxl readxl_example readxl_progress readxl:: > readxl_example() [1] "clippy.xls" "clippy.xlsx" "datasets.xls" "datasets.xlsx" [5] "deaths.xls" "deaths.xlsx" "geometry.xls" "geometry.xlsx" [9] "type-me.xls" "type-me.xlsx" > readxl_example('clippy.xlsx') [1] "/lustre/scratch/shaima0d/rlibs/readxl/extdata/clippy.xlsx"
R batch job on compute node
To run a batch job using R script, simply prepare the script are use Rscript
in your SLURM jobscript to launch it. The following jobscript demonstrates a hello world
example run as a batch job:
simulation = function(long){ c = rep(0,long) numberIn = 0 for(i in 1:long){ x = runif(2,-1,1) if(sqrt(x[1]*x[1] + x[2]*x[2]) <= 1){ numberIn = numberIn + 1 } prop = numberIn / i piHat = prop *4 c[i] = piHat } return(c) } size = 1000 res = simulation(size) sprintf('calculated Pi value= %f',res[size])
The SLURM jobscript to execute the above script will look as follows:
#!/bin/bash #SBATCH -n 1 #SBATCH -c 32 #SBATCH --hint=nomultithread #SBATCH -t 00:10:00 module load cray-R srun -n 1 --hint=nomultithread Rscript pi.R