R is a powerful language and is popular among bioinformaticians, statisticians and data scientists.

On Shaheen we recommend using Cray’s R installation for intercalatively using via R’s CLI or preferably batch processing of R scripts.

R interactive shell on compute node

To run R interactively on a Shaheen compute node, first request a compute node with 32 dedicated threads. This will also give you access to ~126GB of memory:

salloc -t 01:00:00 --hint=nomultithread -n 1 -c 32

Once allocated, you will land on the gateway node which has capability to launch jobs on the allocated compute node. For interactive session of R, we will need to access the compute node itself, therefore, the following step is needed:

srun -c 32 --hint=nomultithread --pty bash

You can now load the module and run R:

module load cray-R
> R

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-suse-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> installed.packages()      
           Package      LibPath                          Version  
KernSmooth "KernSmooth" "/opt/R/4.1.1.0/lib64/R/library" "2.23-20"
MASS       "MASS"       "/opt/R/4.1.1.0/lib64/R/library" "7.3-54" 
Matrix     "Matrix"     "/opt/R/4.1.1.0/lib64/R/library" "1.3-4"  
base       "base"       "/opt/R/4.1.1.0/lib64/R/library" "4.1.1"  
boot       "boot"       "/opt/R/4.1.1.0/lib64/R/library" "1.3-28" 
class      "class"      "/opt/R/4.1.1.0/lib64/R/library" "7.3-19" 
cluster    "cluster"    "/opt/R/4.1.1.0/lib64/R/library" "2.1.2"  
codetools  "codetools"  "/opt/R/4.1.1.0/lib64/R/library" "0.2-18" 
compiler   "compiler"   "/opt/R/4.1.1.0/lib64/R/library" "4.1.1"  
datasets   "datasets"   "/opt/R/4.1.1.0/lib64/R/library" "4.1.1"  
foreign    "foreign"    "/opt/R/4.1.1.0/lib64/R/library" "0.8-81" 
grDevices  "grDevices"  "/opt/R/4.1.1.0/lib64/R/library" "4.1.1"  
graphics   "graphics"   "/opt/R/4.1.1.0/lib64/R/library" "4.1.1"  
grid       "grid"       "/opt/R/4.1.1.0/lib64/R/library" "4.1.1"  
lattice    "lattice"    "/opt/R/4.1.1.0/lib64/R/library" "0.20-44"
.......
graphics   NA      NA     "yes"            "4.1.1"
grid       NA      NA     "yes"            "4.1.1"
lattice    NA      NA     "yes"            "4.1.1"
methods    NA      NA     "yes"            "4.1.1"
mgcv       NA      NA     "yes"            "4.1.1"
nlme       NA      NA     "yes"            "4.1.1"
nnet       NA      NA     "yes"            "4.1.1"
parallel   NA      NA     "yes"            "4.1.1"
rpart      NA      NA     "yes"            "4.1.1"
spatial    NA      NA     "yes"            "4.1.1"
splines    NA      NA     "yes"            "4.1.1"
stats      NA      NA     "yes"            "4.1.1"
stats4     NA      NA     NA               "4.1.1"
survival   NA      NA     "yes"            "4.1.1"
tcltk      NA      NA     "yes"            "4.1.1"
tools      NA      NA     "yes"            "4.1.1"
utils      NA      NA     "yes"            "4.1.1"

Installing new packages

For installing new packages, we recommend you install them in your own scratch directory. For this you will need to set an environment variable R_LIBS=/scratch/$USER/rlibs on any other directory in scratch

export R_LIBS=/scratch/$USER/rlibs

Then start the interactive session:

> R

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-suse-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

And install the package:

> install.packages('readxl')

Here, as an example, I install readxl package. Please note that this will ultimately be installed in the directory pointed to by R_LIBS variable set earlier. If you don’t set the variable, R will try to install it in the root directory of R and it will fail due to permissions issue.

Passing compiler configuration FLAGs

Sometime R requires extra help to get directions where include headers and dynamic libraries are located. This is usually needed in the configure step of C , C++, or Fortran codes. For example, the following is a package that depends on NetCDF and requires headers information where they are. These flags can be listed in a file /scratch/$USER/.R/Makevars and R will honor them when compiling the package.

note

The following steps are executed in an interactive session on Shaheen compute node

The following steps are executed in an interactive session on Shaheen compute node

srun -n 1 -c 32 -t 0:30:0 -p debug --pty bash
module swap PrgEnv-cray PrgEnv-gnu
module load cray-netcdf
module load cray-R

Create the Makevars if it doesn’t already exist:

mkdir -p /scratch/$USER/.R/Makevars

The contents of Makevars can be:

CC=cc
CXX=CC
FC=ftn
CFLAGS=-I/opt/cray/pe/netcdf/4.7.4.4/GNU/8.2/include
LDFLAGS=-L/opt/cray/pe/netcdf/4.7.4.4/GNU/8.2/lib

Now we can call the package installer either in a R interactive session or with R CMD INSTALL command line interface.

shaima0d@nid00008:/lustre/scratch/project/k01/exclude/shaima0d/tickets/48534> R

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-suse-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages('ncdf4')
Installing package into '/lustre/scratch/project/k01/exclude/shaima0d/tickets/48534/libs'
(as 'lib' is unspecified)
--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors 

 1: 0-Cloud [https]
 2: Australia (Canberra) [https]
 ....
77: Uruguay [https]
78: (other mirrors)

Selection: 1
trying URL 'https://cloud.r-project.org/src/contrib/ncdf4_1.21.tar.gz'
Content type 'application/x-gzip' length 127380 bytes (124 KB)
==================================================
downloaded 124 KB

* installing *source* package 'ncdf4' ...
** package 'ncdf4' successfully unpacked and MD5 sums checked
** using staged installation
configure.ac: starting
checking for nc-config... yes
Using nc-config: nc-config
Output of nc-config --all:

This netCDF 4.7.4 has been built with the following features: 

  --cc            -> cc
  --cflags        -> -DpgiFortran
  --libs          -> 
  --static        -> -lhdf5_hl -lhdf5 -lm -lz -ldl

  --has-c++       -> no
  --cxx           -> 

  --has-c++4      -> yes
  --cxx4          -> CC
  --cxx4flags     -> -DpgiFortran
  --cxx4libs      -> 

  --has-fortran   -> yes
  --fc            -> ftn
  --fflags        -> 
  --flibs         -> 
  --has-f90       -> 
  --has-f03       -> yes

  --has-dap       -> no
  --has-dap2      -> no
  --has-dap4      -> no
  --has-nc2       -> yes
  --has-nc4       -> yes
  --has-hdf5      -> yes
  --has-hdf4      -> no
  --has-logging   -> no
  --has-pnetcdf   -> no
  --has-szlib     -> no
  --has-cdf5      -> yes
  --has-parallel4 -> no
  --has-parallel  -> no

  --prefix        -> /opt/cray/pe/netcdf/4.7.4.4/GNU/8.2
  --includedir    -> /opt/cray/pe/netcdf/4.7.4.4/GNU/8.2/include
  --libdir        -> /opt/cray/pe/netcdf/4.7.4.4/gnu/8.2/lib
  --version       -> netCDF 4.7.4

---
netcdf.m4: about to set rpath, here is source string: ><
netcdf.m4: final rpath:  
Netcdf library version: netCDF 4.7.4
Netcdf library has version 4 interface present: yes
Netcdf library was compiled with C compiler: cc
configure: creating ./config.status
config.status: creating src/Makevars
 
**********************  Results of ncdf4 package configure *******************
 
netCDF v4 CPP flags      = -DpgiFortran
netCDF v4 LD flags       =    
netCDF v4 runtime path   =  
 
netCDF C compiler used   = cc
R      C compiler used   = cc -I/opt/cray/pe/netcdf/4.7.4.4/GNU/8.2/include
 
******************************************************************************
 
** libs
cc -I"/opt/R/4.1.1.0/lib64/R/include" -DNDEBUG -DpgiFortran  -I/usr/local/include   -fpic  -I/opt/cray/pe/netcdf/4.7.4.4/GNU/8.2/include -c ncdf.c -o ncdf.o
cc -Wl,-rpath,/opt/cray/pe/gcc-libs -shared -L/usr/local/lib64 -o ncdf4.so ncdf.o
installing to /lustre/scratch/project/k01/exclude/shaima0d/tickets/48534/libs/00LOCK-ncdf4/00new/ncdf4/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (ncdf4)

The downloaded source packages are in
	'/tmp/RtmpdVkhKP/downloaded_packages'

Using R packages

For using the installed package, please set the R_LIBS variable before you call the R package:

export R_LIBS=/scratch/$USER/rlibs
> R
R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-suse-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(readxl)
> readxl 
readxl_example   readxl_progress  readxl::         
> readxl_example()
 [1] "clippy.xls"    "clippy.xlsx"   "datasets.xls"  "datasets.xlsx"
 [5] "deaths.xls"    "deaths.xlsx"   "geometry.xls"  "geometry.xlsx"
 [9] "type-me.xls"   "type-me.xlsx" 
> readxl_example('clippy.xlsx')
[1] "/lustre/scratch/shaima0d/rlibs/readxl/extdata/clippy.xlsx"

R batch job on compute node

To run a batch job using R script, simply prepare the script are use Rscript in your SLURM jobscript to launch it. The following jobscript demonstrates a hello world example run as a batch job:

simulation = function(long){
  c = rep(0,long)
  numberIn = 0
  for(i in 1:long){
    x = runif(2,-1,1)
    if(sqrt(x[1]*x[1] + x[2]*x[2]) <= 1){
      numberIn = numberIn + 1
    }
    prop = numberIn / i
    piHat = prop *4
    c[i] = piHat
  }
  return(c)
}

size = 1000
res = simulation(size)
sprintf('calculated Pi value= %f',res[size])

The SLURM jobscript to execute the above script will look as follows:

#!/bin/bash
#SBATCH -n 1 
#SBATCH -c 32
#SBATCH --hint=nomultithread
#SBATCH -t 00:10:00


module load cray-R
srun -n 1 --hint=nomultithread Rscript pi.R