Working with singularity images containing Conda environments

It is a convenient and highly productive (though not always most performant) to install software using conda environment. Containerizing conda runtime may also enable portability of the runtime along and still retaining productivity. This is a guide for working with images containing software install with conda environments and how to activate a particular conda environment as entrypoint when running with Singularity container platform on KSL systems.

Creating a Docker image from Dockerfile

First you need to build an image that uses a base image containing conda and mamba package managers (the later is a parallel implementation of conda). For this the Dockerfile show below is to be built using docker build command on your local machine (workstation/laptop), where docker is installed. As an example the Dockerfile installs a Bioinformatics application in a new directory called /software :

FROM krccl/miniforge:latest LABEL maintainer="Mohsin Ahmed Shaikh (mohsin.shaikh@kaust.edu.sa)" LABEL version="1.5.0" RUN mamba create -n jupyter -y -c conda-forge -c bioconda genomad==1.5.0 && \ mamba clean --all --yes WORKDIR /workdir COPY entrypoint.sh /software/entrypoint.sh RUN chmod +x /software/entrypoint.sh ENTRYPOINT ["/software/entrypoint.sh"]

Where the entrypoint.sh first activates the target conda environment and then takes the arguments of the containers' command line to run as subsequent command:

#!/bin/bash #Initialize conda source /software/bin/activate genomad exec "$@"

To build the a docker image from this Dockerfile:

docker build -t genomad .

The above command will look for a file called Dockerfile and will build a docker image in your local docker image registry on your local machine. To push this image on DockerHub, you first run the container, commit and then push it to the target repository (starting with your docker username) in DockerHub registry of images.

Creating a modified Singularity Image File from Docker image

On HPC systems of KSL, we use Singularity as our container platform to run containers. For this we pull the image we worked with to bring it on Ibex. Singularity understands Singularity Image Format and the singularity pull or singularity build command creates one from the docker image. By default, Singularity disables any entrypoint behavior in docker images. To re-enable the entrypoint we will rebuild the genomad image into a SIF format using a Singularity definition file which describes what to change in the base docker image. Our singularity definition file genomad.def is pretty minimal. It activates the target Conda environment and allows running any command passed on singularity command line:

In the genomad.def file shown above, we first pull our docker image from DockerHub. They add instructions to enable running a script upon creation of a container, which activates our Conda environment.

Building the SIF images is only possible on Ibex compute nodes. We therefore write a SLURM job script to submit the build process to run on a compute node using singularity fakeroot feature. fakeroot is required because the building of Singularity images from Singularity definition files requires a temporary privilege escalation. The job script looks as follows:

Note that this job can take longer than 10 minutes, depending on the size of the docker image.

Upon successful completion of the SLURM job, you should end up with a genomad.sif file which is an executable.

Running a Container from the new Image

Now that we have a Singularity image for our application that was built with conda environment, we can create a container to run our commands. In the following, I am querying help on genomad application: