Sunday Recap

3 minute read

Published:

Soft memory limit exceeded

Recently I starting working with QM/MM Umbrella Sampling calculations. I use Amber for the Molecular Mechanics part and Terachem for the electronic structure calculations. The communication between the two softwares is not a trivial issue but has been solved sufficiently with the terachem-protocol-buffer (TCPB). Also, my job scheduler is Slurm. The amount of files produced during these runs and the memory used is massive. 20 umbrella windows alone produced ~15,000,000 files and 2.6Tb of memory which means that I started recieving the notorious “exceeded memory limits” emails from the IT department. Naturally, the question rises: How do I rigorously monitor the jobs running, their memory usage and the number of files generated?

To understand how much memory these files occupied and also to measure their number, I wrote this bash scirpt (also added timing because the du -sh command is pretty slow):

#!/bin/bash

MeasureMemory() {
        read -p "Enter index1:" index1
        read -p "Enter index2:" index2
        read -p "Enter step:" step

        total=0
        for (( i = index1; i <= index2; i+= step )); do
                start=$(date +%s)
                output=$(du -sh "d_${i}_0_")
                echo "$output"
                num=$(echo "$output" | awk '{ if (match($0, /[0-9]+/)) print substr($0, RSTART, RLENGTH) }')
                total=$((total + num))
                end=$(date +%s)
                elapsed=$(echo "$end - $start" | bc)
                echo "Iteration ${i} took ${elapsed} seconds to finish!"
        done
        echo "Total memory is: ${total}G"
}

NumberOfFiles() {
        read -p "Enter index1:" index1
        read -p "Enter index2:" index2
        read -p "Enter step:" step
        total=0
        for (( i = index1; i <= index2; i+= step )); do
                count=$(find "d_${i}_0_" -type f | wc -l)
                total=$((total + count))
                echo "Directory d_${i}_0 contains ${count} files."
        done
        echo "Total number of files is: ${total}"
}


MeasureMemory
NumberOfFiles

Very cool things I learned writing these script were:

  1. Bash performs brace expansion before any variable substitution so you need to use C-style loops when you have variables.
  2. The command date +%s outputs the current time as a Unix timestamp, which is the number of seconds that have elapsed since January 1, 1970 (known as the Unix epoch).
  3. You can use awk’s regular expression matching to extract the integer part from a string. Other (easier) ways exist.
  4. Not in the script but I used this very cool Linux command sort.

This script allowed me to gauge how much memory this run will take on average and I concluded that I had enough memory before hitting the hard limit. From now on, I would like to make these more rigorous by sending a test run for ~20 steps. Then, I will extrapolate the memory it will take to complete and will compare with the total number of available space I have in the work directory. To be added to the imaginary to-do list that I never have time to actually do-do.

Lastly, there is a nice little manual provided by the HPC I use at UNC@ChapelHill that briefly explains all the ways you can monitor your jobs running on GPUS.

End-state free energy methods

While not the most accurate (limitations articulated very nicely in this paper) end-state methods are fast and can give reasonable answers for large systems. I read and summarized the original paper of the MMPBSA method in this diagram drawn in my office’s blackboard: