QCIF develops a secure data analysis environment
13 December 2022
A QCIF-developed secure data analysis environment for researchers is currently being tested and is expected to go live in early 2023.
The platform, called KeyPoint, will enable researchers to access, analyse, manage and share sensitive research data in a scalable, fully governed, and highly-secure environment, whilst maintaining full control of their data at all times.
QCIF Bioinformatics and Data Science Director Dr Dominique (Dom) Gorse said working with sensitive data presents challenges for researchers, including accessibility, scalability, shareability and privacy concerns.
“KeyPoint is an innovative data infrastructure and digital solution that has been designed from the ground up to address those challenges by enabling data governance at scale,” said Dom.
“KeyPoint fulfills a missing component of Queensland’s research infrastructure and we are excited to bring this platform to life with significant research projects being onboarded.”
Several ground-breaking University of Queensland projects are early adopters of KeyPoint, for example, the Australian Centre of Excellence in Melanoma Imaging and Diagnosis, Aged Care Data Compare, Australian Longitudinal Study on Women's Health, ATLAS Indigenous Primary Care Surveillance Network, and the Global Drug Survey. See a detailed description below about these projects.
Many more collaborative research projects across all research domains that involve sensitive data are likely to adopt KeyPoint next year.
Having been co-designed with researchers responsible for valuable data assets with a broad range of stringent data governance and analysis requirements, KeyPoint comprises all the infrastructure, software, systems and analytical tools required by researchers to conduct powerful data analyses on authorised data to address their research questions.
Stephen Bird, QCIF’s Shared Infrastructure Services Manager, explained: “Each project has its own self-contained and self-managed workspace, called a ‘Vault’, in KeyPoint. Each Vault contains the project’s data sets that are being made available for specific research purposes. Each Vault contains strongly separated research activities. Each ‘Research Activity’ encompasses the data sets (or specific slices of the data sets) that have been shared to it, the set of researchers granted access to the Research Activity, and its own storage space to facilitate collaboration. Further, each researcher has their own personal storage space for their work, which is specific to a Research Activity.”
KeyPoint enforces data governance and security consistent with the Five Data Sharing Principles: Safe Project, Safe People, Safe Settings, Safe Data and Safe Outputs.
All data entering and leaving a Vault in KeyPoint must be approved by the Vault’s ‘Data Steward’ (or their delegated ‘Data Approvers’) under the rules of the project’s data governance requirements.
Access to the entire KeyPoint environment uses the Australian Access Federation (AAF) and a user’s institutional login coupled with an additional layer of authentication specific to KeyPoint.
Support for external collaborators without AAF access is provided through QCIF’s use of AAF’s ‘Virtual Home’ capability.
KeyPoint is being deployed in a separate, restricted access region on QRIScloud, the QCIF-managed Queensland node of the ARDC Nectar Research Cloud. KeyPoint’s compute infrastructure provides workstation-scale analytics environments, including high-memory virtual desktops for data-intensive workloads and GPU-enabled virtual desktops for visualisation, machine learning and AI workloads.
By using KeyPoint, researchers can gain the required trust to receive sensitive data for analysis from Australian state and federal data custodians.
KeyPoint is currently designed for de-identified data with a risk of re-identification. Each project wishing to be onboarded on KeyPoint is required to perform a risk assessment to confirm the suitability of KeyPoint for their project.
Co-funders and partners of KeyPoint include the Australian Research Data Commons (ARDC), the Queensland Government and The University of Queensland. QCIF will look to collaborate with partners to secure further investment towards scaling-up and enhanced deployment of KeyPoint across a breadth of research domains and applications.
For more information, please visit our KeyPoint webpage.
To register your interest in using KeyPoint, or for further information about KeyPoint, please contact us: keypoint@qcif.edu.au.
Some of the projects currently testing and using KeyPoint:
Aged Care Data Compare (ACDC)
Challenge: Residential aged care facilities (RACFs) in Australia use a variety of software solutions to collect and manage data related to the assessments and care of residents. There is variation in the data collected and representation of such data in these solutions which makes data sharing very difficult and hinders benchmarking of care quality across provider organisations.
Data: The Aged Care Data Compare (ACDC) project is developing a data hub to assemble de-identified resident assessment data from several RACFs in standardised form and enable benchmarking of the care quality of providers against their peers.
Solution: The data hub will be underpinned by health data standards to support the exchange of resident assessment data and a central repository to store assessment data supplied by participating RACFs. The health data exchange standard will be developed using FHIR (Fast Healthcare Interoperability Resources), a HL7 (Health Level Seven) specification. The data items and definitions in the interRAI long-term care facility instrument will be adopted to standardise the assessment items while FHIR will be used to standardise their representations. KeyPoint will be used to benchmark individual provider care quality against other providers using a set of quality indicators, a critical part of the overall solution.
Transformation: The project expects to trial the data hub with data collected from about 30 RACFs.
The Australian Centre of Excellence in Melanoma Imaging and Diagnosis (ACEMID)
Challenge: The Australian Centre of Excellence in Melanoma Imaging and Diagnosis (ACEMID) is a multi-disciplinary and multi-site collaborative imaging program investigating the use of 3D Total Body Photography (TBP) to improve early detection of melanoma and other skin cancers.
Data: ACEMID is establishing a unique national research repository comprising 3D total body photography, clinical and dermoscopy images, clinical and survey data. ACEMID will be a key data source for three Centres for Research Excellence which will bring additional data types such as histopathologic data, genetic testing and other ‘omics data. The Melanoma Clinical Outcomes Registry is another source of data.
Solution: Image storage, management and processing are performed within the XNAT Platform (Brisbane, Melbourne and Sydney). The integration of XNAT with SeRP and KeyPoint as part of this project will provide ACEMID with the ability to:
establish appropriate data governance at the scale required
use the Dermagraphix body mapping software and other analysis software in a secure environment, and
perform operations such as de-identification and data linkage.
Transformation: This will enable the establishment of a world-first teledermatology network based on 3D TBP with the ultimate aim to translate the service (especially big-data AI capabilities) into standard clinical practice.
The Australian Longitudinal Study on Women’s Health (ALSWH)
Challenge: The Australian Longitudinal Study on Women’s Health (ALSWH) is a national data asset of multi-wave longitudinal survey data of women’s physical and mental health and their use of health services. ALSWH is core to multiple research initiatives, including the Centre for Research Excellence on Women and Non-communicable Disease (CRE WaND). The key challenge is to efficiently and securely manage the provision of the fundamental research data and the related linked heterogeneous data sets, to a large number of specifically authorised researchers, in order to conduct varying individual research projects.
Data: The fundamental collection is made of 32 data sets from four cohorts. Beyond the fundamental data, these studies utilise multiple heterogeneous national data sets that, depending on data custodian approval, can be linked to the fundamental data sets. There are currently 51 linkable data sets and 14 derived data sets in development.
Solution: SeRP and KeyPoint will provide the required scalable data governance and secure data analysis environment to meet data custodians’ requirements.
Transformation: In addition to greater geographic accessibility enabling more research outputs, the platform will drastically decrease the time for data custodian approval and increase the speed of the deployment for individual projects, enabling a faster start of the research and accelerating research outcomes.
ATLAS Indigenous Primary Care Surveillance Network
Challenge: The ATLAS Indigenous Primary Care Surveillance Network is a collaboration with multiple Aboriginal Community Controlled Health Organisations (ACCHOs) across Australia to better utilise service-level data and drive improvements to the way in which clinics screen, test and treat sexually transmissible infections (STIs) and blood-borne viruses (BBVs), and other vaccine preventable diseases (VPD). The ATLAS network works with ACCHOs to provide high-quality, evidence-based, best practice clinical care resulting in improved health outcomes for Aboriginal and Torres Strait Islander peoples. Clinics have secure access to data aggregated at the ACCHO level. However, for research, secure access to network-wide disaggregated data while also adhering to Indigenous data governance and Indigenous data sovereignty principles is required.
Data: ATLAS routinely acquires disaggregated and de-identified electronic medical records to extract STIs, BBV and VPDs data. There are 36 ACCHOs and five clinical hubs in the network with ongoing expansion across Australia.
Solution: SeRP and KeyPoint will provide a platform through which our Indigenous data custodians can operationalise the principles of Indigenous data governance and Indigenous data sovereignty for researchers using ATLAS data. Furthermore, the platform can facilitate data linkage with other national STI collections.
Transformation: The secure research environment will:
expedite researcher access to and linkage with ATLAS network data, and
improve Indigenous data sovereignty and governance to contribute to the continuous quality improvement of ACCHOs STI, BBV and VPD service delivery for First Nations Peoples.
Global Drug Survey
Challenge: The Global Drug Survey is the world’s largest online survey of drug use. Each year, the survey is typically completed by more than 100,000 respondents, resulting in large multi-year data sets. A major challenge is making this data readily available for researchers from across the world to access, in a secure and governed way, whilst allowing for researchers to utilise their analytical software of choice.
Data: The Global Drug Survey data consists of multi-year cross-sectional data sets with more than 1500 variables captured per year provided in 10 different languages. This includes special issue data sets, such as our COVID-19 Special Edition, and special question-set modules, such as drink spiking or no/low alcohol consumption, changes in drug use behaviour given changes in country-level policies and laws, harms from identified new or emerging drugs, perceptions around drug checking and many more.
Solution: KeyPoint is the go-to solution for the Global Drug Survey. It provides the required scalable data governance and secure data analysis environment to meet data custodians’ requirements and support researchers from around the world.
Transformation: First, KeyPoint will permit gold standard data sharing and data governance practice with full geographic accessibility, this in turn, will allow collaborative research from international academics, accelerate research outputs and ultimately inform and change policies. Second, while GDS is an annual cross-sectional survey, the KeyPoint platform, with appropriate data governance, will allow for data harmonisation across the survey years; in turn reducing the data management burden for the researcher and reducing the time between analysis, dissemination and research translation.