Dr Marlies Hankel, a QCIF eResearch Analyst at UQ, outlines why people should explain what they mean by parallel computing as there are different interpretations.
Researchers who use high-performance computers often use the word ‘parallel’ to describe their calculations, which can cause confusion as it often means something completely different to those who try and support them.
The reason this single word is not sufficient to describe a workload or resource requirements, is it often means a different thing to different people. For most calculations, programs and workloads, a more detailed description is needed to judge and understand requirements.
Parallel workloads that use message passing for multiprocessing require all tasks to be run at the same time together and require communication between tasks. These types of workloads most likely use MPI (message passing interface) or other message passing protocols. They are able to be spread out over several nodes, often thousands of cores.
Parallelising is used to speed up the calculations but also to get around memory constraints of single nodes. This is sometimes referred to as distributed memory computing. These workloads usually require a large number of cores but not much memory or disk space. And they require a fast network between nodes.
Parallel workloads that use shared memory multiprocessing also require all tasks to be run at the same time together. They are able to use all cores within a single node to speed up the calculations. These workloads are limited by the number of cores and memory available on a single node. These workloads can be very diverse and can require large memory or disk space (or not).
Parallel workloads that do not need to be run at the same time often need to run the same (or similar) task many, many times for statistical reasons or to cover a range of parameters. These workloads need to run a large set of tasks that can be run at the same time or after each other. These workloads are array job type workloads. If the number of tasks is more than 100, or in the thousands, they would be considered a high-throughput workload. These workloads can require large memory per task, large disk space and, in most cases, they need a very large number of cores.
There are also workloads that are either a combination of MPI and job arrays, a combination of shared memory and job arrays or a combination of shared memory and MPI.
So, just saying “my calculations are parallel” is not enough for someone to understand what resources might be needed. Users should therefore always provide a detailed description to get appropriate advice and access to resources.
Please contact your institution’s QCIF eResearch Analyst for further advice.
This article was first published on 26/02/2020.