You are here:
Dr David Green

Dr David Green, a QCIF eResearch Analyst at UQ, outlines what to do and what not to do when using a shared high-performance computer.

Please Do

  • Remember that you share the login nodes (and remote desktop environments where they are provided) with many other users. Be mindful of that.
  • If you need to run something “strenuous” interactively, then use the batch system’s qsub -I … mechanism. This would apply to compiling and/or testing code, pre-processing or post-processing data.
  • Always utilise the local disk $TMPDIR location in your batch jobs unless there is a valid reason not to. 
    • Valid reasons include:
      • My job runs on multiple nodes using MPI so I cannot use $TMPDIR.
      • My job needs more space than $TMPDIR can provide.
    • Invalid reasons include:
      • I am not interested in making my jobs run better.
      • Using $TMPDIR is too hard. (It isn’t really difficult to use $TMPDIR.)
  • If you need to transfer a lot of small files, it is smarter to lump them into a single tar or zip file and transfer them as part of a larger lump of data.
  • If you are running jobs that are identical except for input parameter and input file name, consider using job arrays instead of submitting many individual jobs of an almost identical form.
  • If you are running many thousands of jobs, and/or if your numerous tasks are relatively short lived, you should definitely consider using the Nimrod scientific workflow.

Please Don’t

  • Run processing for sustained periods on login nodes (or remote desktops). Your access to resources (CPU and RAM) is capped, but you should nonetheless avoid inconveniencing other users by running work on login nodes. Use the batch system instead.
  • Just because you can write a Shell or Python script to rapidly fire qsub commands at the batch system, you should avoid doing so. Rapidly firing jobs at a HPC can cause issues for the batch submission sub-system. Having large numbers of small jobs queued can also impact scheduler performance. Please use Job Arrays or Nimrod instead when you need to run large numbers of similar jobs.
  • Avoid abusing the login environment with the watch command. In many circumstances, the output of the command line that the watch is watching does not change anywhere near as frequently as the watch command will report.

This article was first published on 12/12/2018.