Services Portal

QRIScloud, HPC, data services: all of QCIF's services at a glance.


For Researchers

Whether you need answers to data storage, computing, collaboration, project management or software problems, we can help.

Read More

For Industry

Drawing on our network of state and national collaborations we bring together expertise from research and industry sectors to solve real world problems.


Scaling and HPC resources

Jason D’Netto, QCIF's eResearch Analyst at QUT, outlines how to find the optimal balance of HPC parallelism for your code.

High-performance computing (HPC) machines are collections of computers, frequently referred to as nodes.
Parallel computing involves many calculations or the execution of processes carried out simultaneously across HPC nodes. Large problems can often be divided into smaller ones, which can then be solved at the same time. 
Parallelism for these nodes is generally done in a combination of two ways:

  • Symmetric multiprocessing (SMP) — multi-core parallel processing on the same node
  • Message Passing Interface (MPI) — parallel processing between nodes. 

It is often not enough to just use more resources to speed up a computation.
MPI can create a process for each core of a node, accomplishing both multi-core processing on a node and communication between nodes.
Unfortunately, MPI carries a communication overhead, which can outweigh the general versatility of its scalability. Using MPI under the wrong circumstances would be like spending hours sharpening an axe to snap a twig.
Depending on your code, there may be an optimal balance between SMP and MPI parallel processing to achieve the best utilisation and speed for your computation.
The first step in determining this optimal balance is to profile your code. Profiling is best done with a small sample set of data that runs all the functions at least once for a fraction of iterations of the actual computation.
There is usually a profiler available for the coding language of your choice.
If your code is taking too long to process, and the slowest part is something you did not write—such as a built-in R function—you can try to find a software library that will provide a faster alternative to the slow function.
The next step is to do a few small trials on smaller data sets to determine the best mix of SMP and MPI parallelism for your code.
Once your trials are done, it is time to run the full data set with the balance of parallelism found for your code.
Talk to your local eResearch support staff if you have any questions about scaling and HPC resources.