MongoDB on compute nodes of Shaheen
In this document, we will explore how to launch a mongoDB
server on a Shaheen compute node and then connect to it from another compute node of Shaheen interactively.
The server launched will be submitted as a batch job to SLURM and can run for no more than 24 hours.
We will use mongo
from a singularity container. Singularity has provided a Singularity Definition file or def
file to create a mongo
image. This can done exclusively on a Ibex compute node. Shaheen does not support creation of images from Singularity definition file. You therefore need to have access to Ibex for this step. This is a one-off step.
Ibex Jobscript to create mongo
image file
For this step, you should be able to run a job on Ibex cluster.
First clone the git repository containing the Singularity definition file to create the image:
cd $HOME
git clone https://github.com/singularityhub/mongo.git
The jobscript looks as follows:
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
module load singularity
cd $HOME/mongo
export XDG_RUNTIME_DIR=$HOME
singularity build --fakeroot mongo.sif Singularity
A successful completion should result in creation of a singularity image file mongo.sif
.
Working with the Image
Since /home
filesystem is shared between Ibex and Shaheen, you would be able to access this image file from Shaheen login node as well.
Let’s switch back to Shaheen. Copy or move your image file mongo.sif
to somewhere in your project directory. For example, I have copied mine in /project/k01/shaima0d/mongo_test
. Mongo DB requires a write permitted space to do some housekeeping for the database. We need to create a directory, e.g. data,
and bind it when launching the database instance.
cd /project/k01/shaima0d/mongo_test
mkdir data
Here is how the database launch jobscript looks like:
The above jobscript should launch a mongodb
daemon in a secure manner. Now we are ready to connect with it. Let’s connect our client. Note the IP address from the slurm-xxxxxx.out
file where the database server was running, e.g. 10.109.197.13
Load the singularity module and ask for an interactive session with the srun
command :
After the resources are allocated yo will see the output like this below:
Since mongod
launched in the Jobscript is listening on Cray Aries interconnect, it is necessary that the client runs on a compute node to connect to the IP address of the device where this server is running. The client won’t run on login node.
The legacy mongo
shell is no longer included in server packages as of MongoDB 6.0. mongo
has been superseded by the mongosh
)
Using pymongo
Driver
Once the Mongo server is running usingmongod
as described above, we can interact with it using pymongo
driver, the defacto way to use MongoDB from within python.
Following is an example python script:
The above test can run in a separate jobscript. We need to parse the IP address where our MongoDB is running. This is printed in the first line of the slurm output file of the MongoDB server job we submitted. E.g. our server is running on IP address: 10.128.0.95
. The following jobscript
can be submitted to run the client which launches pymongo
python test.
Output looks as follows:
Using mongodump
To create a binary dump of the database and/or a collection, one can run it as a separate job. The following example jobscript creates a gzip
archive of an existing database. It is assumed here that a mongodb
server is already running as has been described above. Given that the IP address of the host of this server is 10.128.0.95
This should create a file data_2021-02-24.gz
(date may vary) in your present working directory.
Once run the above command as an interactive operation in a salloc
session:
Using mongorestore
Once you have a compressed dump of your database/collection, you can copy to a remote destination to restore your database there. For instance, if we have a compressed file data_2021-02-24.gz
I can scp
to my workstation/laptop where I have a mongodb
installation and restore there.
I installed mongodb
in a conda
environment.
First, I start a new mongodb
server on my local machine on localhost
:
Now we can start the restoration step in a new terminal:
Let us see if it has been ingested in our mongodb
server: