Motivation

Sometimes it is needed to copy large number of files from /scratch to /project or vice versa. Both cp and rsync are convenient but sometimes you need speed to do such task.

Distributed Copy

dcp or distributed copy is a MPI-based copy tool developed by Lawrance Livermore National Lab (LLNL) as part of their mpifileutils suite. We have installed it on Shaheen. Here is an example jobscript to launch a data moving job with dcp:

#!/bin/bash 

#SBATCH --ntasks=4
#SBATCH --time=01:00:00
#SBATCH --hint=nomultithread

module load mpifileutils
time srun -n ${SLURM_NTASKS} dcp --verbose --progress 60 --preserve /path/to/source/directory /path/to/destination/directory

The above script launches dcp in parallel on with 4 MPI processes. --progress 60 implies that the progress of the operation will be reported every 60 seconds. --preserve implies that the ACL permissions, group ownership, timestamps and extended attributes will be preserved on the files the destination directory as were in parent/source directory.

The following is an example output:

[2021-01-21T16:01:51] Preserving file attributes.
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/PinatuboInitialStage
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/README.txt
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/build_alex.slurm
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/build_alex2.slurm
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/build_own.slurm
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/kuwait_heavy.slurm
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/minmax.ncl
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/nasaballon.slurm
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/run_wrf_371_kuwait_heavy.sh
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/run_wrf_371_nasaballoon_light.sh
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/slurm-17933845.out
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/slurm-17933846.out
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/slurm-17933847.out
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/slurm-17933848.out
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/slurm-17934624.out
[2021-01-21T16:01:51] Walking /project/k01/shaima0d/Shaheen3/Stenchikov/submit_script.sh
[2021-01-21T16:01:52] Walked 7844 items in 0.595307 secs (13176.397504 items/sec) ...
[2021-01-21T16:01:52] Walked 7844 items in 0.595524 seconds (13171.591813 items/sec)
[2021-01-21T16:01:52] Copying to /scratch/shaima0d/Shaheen3/Stenchikov
[2021-01-21T16:01:52] Items: 7844
[2021-01-21T16:01:52]   Directories: 189
[2021-01-21T16:01:52]   Files: 7247
[2021-01-21T16:01:52]   Links: 408
[2021-01-21T16:01:52] Data: 531.085 GB (75.042 MB per file)
[2021-01-21T16:01:52] Creating directories.
[2021-01-21T16:01:52]   level=6 min=0 max=1 sum=1 rate=272.853500/sec secs=0.003665
[2021-01-21T16:01:52]   level=7 min=0 max=5 sum=19 rate=515.727600/sec secs=0.036841
[2021-01-21T16:01:52]   level=8 min=0 max=10 sum=59 rate=541.667256/sec secs=0.108923
[2021-01-21T16:01:52]   level=9 min=1 max=6 sum=33 rate=556.079307/sec secs=0.059344
[2021-01-21T16:01:52]   level=10 min=0 max=19 sum=60 rate=521.802914/sec secs=0.114986
[2021-01-21T16:01:52]   level=11 min=0 max=6 sum=12 rate=542.864132/sec secs=0.022105
[2021-01-21T16:01:52]   level=12 min=0 max=2 sum=4 rate=555.776195/sec secs=0.007197
[2021-01-21T16:01:52]   level=13 min=0 max=1 sum=1 rate=515.207468/sec secs=0.001941
[2021-01-21T16:01:52]   level=14 min=0 max=0 sum=0 rate=0.000000/sec secs=0.000001
[2021-01-21T16:01:52] Created 189 directories in 0.355161 seconds (532.153096 items/sec)
[2021-01-21T16:01:52] Creating files.
[2021-01-21T16:01:52]   level=6 min=0 max=6 sum=15 rate=460.022813 secs=0.032607
[2021-01-21T16:01:52]   level=7 min=0 max=7 sum=25 rate=471.742915 secs=0.052995
[2021-01-21T16:02:01]   level=8 min=141 max=540 sum=3995 rate=434.750857 secs=9.189171
[2021-01-21T16:02:03]   level=9 min=1 max=155 sum=516 rate=452.639110 secs=1.139981
[2021-01-21T16:02:07]   level=10 min=4 max=382 sum=1763 rate=435.794907 secs=4.045481
[2021-01-21T16:02:09]   level=11 min=0 max=260 sum=1039 rate=449.518504 secs=2.311362
[2021-01-21T16:02:10]   level=12 min=9 max=66 sum=249 rate=362.368935 secs=0.687145
[2021-01-21T16:02:10]   level=13 min=0 max=38 sum=47 rate=416.477838 secs=0.112851
[2021-01-21T16:02:10]   level=14 min=0 max=4 sum=6 rate=392.927444 secs=0.015270
[2021-01-21T16:02:10] Created 7655 items in 17.587330 seconds (435.256520 items/sec)
[2021-01-21T16:02:10] Copying data.
[2021-01-21T16:03:10] Copied 102.215 GB in 60.072 secs (1.702 GB/s) ...
[2021-01-21T16:04:10] Copied 202.647 GB in 120.113 secs (1.687 GB/s) ...
[2021-01-21T16:05:10] Copied 302.142 GB in 180.145 secs (1.677 GB/s) ...
[2021-01-21T16:06:10] Copied 402.684 GB in 240.246 secs (1.676 GB/s) ...
[2021-01-21T16:07:51] Copied 499.097 GB in 341.481 secs (1.462 GB/s) ...
[2021-01-21T16:07:51] Copied 531.085 GB in 341.482 secs (1.555 GB/s) done
[2021-01-21T16:07:51] Copy data: 531.085 GB (570247967642 bytes)
[2021-01-21T16:07:51] Copy rate: 1.555 GB/s (570247967642 bytes in 341.481616 seconds)
[2021-01-21T16:07:51] Syncing data to disk.
[2021-01-21T16:07:52] Sync completed in 0.716662 seconds.
[2021-01-21T16:07:52] Setting ownership, permissions, and timestamps.
[2021-01-21T16:08:04] Updated 7844 items in 12.315612 seconds (636.915157 items/sec)
[2021-01-21T16:08:04] Syncing directory updates to disk.
[2021-01-21T16:08:04] Sync completed in 0.055182 seconds.
[2021-01-21T16:08:04] Started: Jan-21-2021,16:01:52
[2021-01-21T16:08:04] Completed: Jan-21-2021,16:08:04
[2021-01-21T16:08:04] Seconds: 372.536
[2021-01-21T16:08:04] Items: 7844
[2021-01-21T16:08:04]   Directories: 189
[2021-01-21T16:08:04]   Files: 7247
[2021-01-21T16:08:04]   Links: 408
[2021-01-21T16:08:04] Data: 531.085 GB (570247967642 bytes)
[2021-01-21T16:08:04] Rate: 1.426 GB/s (570247967642 bytes in 372.536 seconds)

Benchmark

As a benchmark, lets try copying 37760 CSV files each of 6.5kB (a total of 241.085 MB).

The table below compares the baseline time taken by cp command to copy these files from /project to /scratch with that taken by dcp with different number of MPI processes:

(MPI) processes

Time to completion

Speedup

cp

1 (serial)

1139.75 seconds

1

dcp

4

888.966 seconds

1.282

dcp

16

226.064 seconds

5.042

dcp

32

401.479 seconds

2.838

Some observations