MongoDB on compute nodes of Shaheen

 

In this document, we will explore how to launch a mongoDB server on a Shaheen compute node and then connect to it from another compute node of Shaheen interactively.

The server launched will be submitted as a batch job to SLURM and can run for no more than 24 hours.

We will use mongo from a singularity container. Singularity has provided a Singularity Definition file or def file to create a mongo image. This can done exclusively on a Ibex compute node. Shaheen does not support creation of images from Singularity definition file. You therefore need to have access to Ibex for this step. This is a one-off step.

Ibex Jobscript to create mongo image file

For this step, you should be able to run a job on Ibex cluster.

First clone the git repository containing the Singularity definition file to create the image:

cd $HOME git clone https://github.com/singularityhub/mongo.git

The jobscript looks as follows:

#!/bin/bash #SBATCH --time=01:00:00 #SBATCH --ntasks=1 module load singularity cd $HOME/mongo export XDG_RUNTIME_DIR=$HOME singularity build --fakeroot mongo.sif Singularity

A successful completion should result in creation of a singularity image file mongo.sif.

Working with the Image

Since /home filesystem is shared between Ibex and Shaheen, you would be able to access this image file from Shaheen login node as well.

Let’s switch back to Shaheen. Copy or move your image file mongo.sif to somewhere in your project directory. For example, I have copied mine in /project/k01/shaima0d/mongo_test. Mongo DB requires a write permitted space to do some housekeeping for the database. We need to create a directory, e.g. data, and bind it when launching the database instance.

cd /project/k01/shaima0d/mongo_test mkdir data

Here is how the database launch jobscript looks like:

The above jobscript should launch a mongodb daemon in a secure manner. Now we are ready to connect with it. Let’s connect our client. Note the IP address from the slurm-xxxxxx.out file where the database server was running, e.g. 10.109.197.13

Load the singularity module and ask for an interactive session with the srun command :

After the resources are allocated yo will see the output like this below:

Since mongod launched in the Jobscript is listening on Cray Aries interconnect, it is necessary that the client runs on a compute node to connect to the IP address of the device where this server is running. The client won’t run on login node.

The legacy mongo shell is no longer included in server packages as of MongoDB 6.0. mongo has been superseded by the mongosh )

https://www.mongodb.com/docs/mongodb-shell/

 

Using pymongo Driver

Once the Mongo server is running usingmongod as described above, we can interact with it using pymongo driver, the defacto way to use MongoDB from within python.

Following is an example python script:

 

The above test can run in a separate jobscript. We need to parse the IP address where our MongoDB is running. This is printed in the first line of the slurm output file of the MongoDB server job we submitted. E.g. our server is running on IP address: 10.128.0.95 . The following jobscript

can be submitted to run the client which launches pymongo python test.

Output looks as follows:

Using mongodump

To create a binary dump of the database and/or a collection, one can run it as a separate job. The following example jobscript creates a gzip archive of an existing database. It is assumed here that a mongodb server is already running as has been described above. Given that the IP address of the host of this server is 10.128.0.95

This should create a file data_2021-02-24.gz (date may vary) in your present working directory.

Once run the above command as an interactive operation in a salloc session:

Using mongorestore

Once you have a compressed dump of your database/collection, you can copy to a remote destination to restore your database there. For instance, if we have a compressed file data_2021-02-24.gz I can scp to my workstation/laptop where I have a mongodb installation and restore there.

I installed mongodb in a conda environment.

First, I start a new mongodb server on my local machine on localhost:

Now we can start the restoration step in a new terminal:

Let us see if it has been ingested in our mongodb server: