Scaling and HPC resources

Jason D’Netto, QCIF’s former eResearch Analyst at QUT, outlines how to find the optimal balance of HPC parallelism for your code.

High-performance computing (HPC) machines are collections of computers, frequently referred to as nodes.

Parallel computing involves many calculations or the execution of processes carried out simultaneously across HPC nodes. Large problems can often be divided into smaller ones, which can then be solved at the same time.

Parallelism for these nodes is generally done in a combination of two ways:

Symmetric multiprocessing (SMP) — multi-core parallel processing on the same node
Message Passing Interface (MPI) — parallel processing between nodes.

It is often not enough to just use more resources to speed up a computation.

MPI can create a process for each core of a node, accomplishing both multi-core processing on a node and communication between nodes.

Unfortunately, MPI carries a communication overhead, which can outweigh the general versatility of its scalability. Using MPI under the wrong circumstances would be like spending hours sharpening an axe to snap a twig.

Depending on your code, there may be an optimal balance between SMP and MPI parallel processing to achieve the best utilisation and speed for your computation.

The first step in determining this optimal balance is to profile your code. Profiling is best done with a small sample set of data that runs all the functions at least once for a fraction of iterations of the actual computation.

There is usually a profiler available for the coding language of your choice.

If your code is taking too long to process, and the slowest part is something you did not write—such as a built-in R function—you can try to find a software library that will provide a faster alternative to the slow function.

The next step is to do a few small trials on smaller data sets to determine the best mix of SMP and MPI parallelism for your code.

Once your trials are done, it is time to run the full data set with the balance of parallelism found for your code.

Talk to your local eResearch support staff if you have any questions about scaling and HPC resources.

This article was first published on 13/11/2018.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.