Running R on Shaheen

R is a powerful language and is popular among bioinformaticians, statisticians and data scientists.

On Shaheen we recommend using Cray’s R installation for intercalatively using via R’s CLI or preferably batch processing of R scripts.

R interactive shell on compute node

To run R interactively on a Shaheen compute node, first request a compute node with 32 dedicated threads. This will also give you access to ~126GB of memory:

salloc -t 01:00:00 --hint=nomultithread -n 1 -c 32

Once allocated, you will land on the gateway node which has capability to launch jobs on the allocated compute node. For interactive session of R, we will need to access the compute node itself, therefore, the following step is needed:

srun -c 32 --hint=nomultithread --pty bash

You can now load the module and run R:

module load cray-R

Installing new packages

For installing new packages, we recommend you install them in your own scratch directory. For this you will need to set an environment variable R_LIBS=/scratch/$USER/rlibs on any other directory in scratch

Then start the interactive session:

And install the package:

Here, as an example, I install readxl package. Please note that this will ultimately be installed in the directory pointed to by R_LIBS variable set earlier. If you don’t set the variable, R will try to install it in the root directory of R and it will fail due to permissions issue.

Passing compiler configuration FLAGs

Sometime R requires extra help to get directions where include headers and dynamic libraries are located. This is usually needed in the configure step of C , C++, or Fortran codes. For example, the following is a package that depends on NetCDF and requires headers information where they are. These flags can be listed in a file /scratch/$USER/.R/Makevars and R will honor them when compiling the package.

The following steps are executed in an interactive session on Shaheen compute node

Create the Makevars if it doesn’t already exist:

The contents of Makevars can be:

Now we can call the package installer either in a R interactive session or with R CMD INSTALL command line interface.

 

Using R packages

For using the installed package, please set the R_LIBS variable before you call the R package:

R batch job on compute node

To run a batch job using R script, simply prepare the script are use Rscript in your SLURM jobscript to launch it. The following jobscript demonstrates a hello world example run as a batch job:

The SLURM jobscript to execute the above script will look as follows: