A University of Queensland led research project has created an open online repository of molecular topologies that has become a globally significant resource used in computational drug design and materials science.
The Automated Topology Builder and Repository (ATB), also used for molecular simulations and the refinement of biological complexes, is currently playing its part in discovery research for a COVID-19 vaccine.
It is one of the largest and most comprehensive molecular structure and parameter databases of its type in the world.
UQ’s Professor Alan Mark (pictured above) and Dr Martin Stroet, based in the School of Chemistry and Molecular Biosciences (SCMB), lead the ATB research team in manually curating topologies of specific molecules. External researchers can also submit small molecules to the repository to create a new topology.
Having used QCIF’s QRIScloud throughout much of the project, which began in 2009, the ATB research team has greatly valued the flexibility and scalability of the cloud compute service.
The ATB uses a whopping 300–500 cores of QRIScloud compute power to run the pipelines that power this process.
Currently containing data in excess of 420,000 molecules (which can be freely downloaded for academic use), the ATB has more than 12,300 registered users, with the user base increasing by 50–100 users per week.
“Our load varies dramatically. A small molecule might process in seconds, a large complex molecule in days,” said Prof. Mark.
“Previously, a single molecule could cause significant delays in processing new submissions and dissuade other users from using our facilities. QRIScloud allows us to draw on resources when needed. It also allows each user submission to the ATB to be managed in a completely independent manner. This makes the system very robust. A single failure does not fully disrupt the pipeline.”
QRIScloud has also enabled the ATB to be used in a number of teaching courses at UQ and around the world, with 30–40 students able to submit requests to the repository and all have answers returned simultaneously.
The ATB contains computational infrastructure including automated workflows, data processing tools and various analysis and visualisation packages.
The repository’s core algorithms and associated management packages consists of more than 56,000 lines of code developed in-house (mostly Python) and interfaces with a wide range of external packages.
As of June 2020, the system contained parameters for more than 420,000 compounds, representing an investment of more than 35 million CPU hours on national high-performance computing facilities both in Australia and the USA.
“The project would not be possible without the generous support of both very large and smaller scale computer facilities in Australia and around the world,” said Prof. Mark.
Molecules submitted by individual users undergo some initial steps on machines housed within SCMB before being transferred to QRIScloud for processing. Bulk processing of molecules in key molecular databases has been undertaken in collaboration with researchers at the USA’s Lawrence Livermore National Laboratory as well as using the Australian high-performance facilities in Canberra (NCI) and Western Australia (Pawsey Supercomputing Centre).
The ATB team is currently engaged in a major project, supported by NCI and Pawsey, to parameterise, at a high level, all biologically relevant forms of molecules that have passed through a phase II clinical trial, i.e. proven safe for human use. These molecules are of major interest to researchers using computational approaches to identify molecules that could be directly used to assist in the COVID-19 pandemic.
Funds for the ATB project have come primarily from the Australian Research Council and the University of Queensland. Additional support has come from Sun Microsystems, UQ’s commercialisation arm UniQuest and QCIF.