Stemformatics’ Hierarchical Cluster to handle more genomics data set types

Stemformatics, a data portal for stem cell researchers, has recently upgraded its Hierarchical Cluster (HC) analysis to handle more Next Generation Sequencing (NGS) data set types.

Previously, users could only run this analysis for the older microarray data sets. Due to popular community interest, the Stemformatics team made it a high priority to run a Hierarchical Cluster with different data types, such as RNA-Seq, small RNA and ChIP-Seq.

Stemformatics Project Manager Rowland Mosbergen said: “By far the most common question for a new Stemformatics user was why they couldn’t run a Hierarchical Cluster on their RNA-Seq data set. Now 25% of the HC analyses run with the new functionality are NGS data sets.”

Stemformatics now sends HC job details to any server running Galaxy, such as the Genomics Virtual Lab (GVL), via the Python library Bioblend. Stemformatics has also included more HC functionality to colour by data set value or colour by normalised row/gene.

The actual software doing the clustering and creating the image is ComplexHeatmap. ComplexHeatmap is a Bioconductor/R package that was installed as a Galaxy tool on a private server running on the Nectar cloud.

Stemformatics is hoping to reuse this architecture to allow its other main analysis, the Gene Neighbourhood, to handle more NGS data set types. The Gene Neighbourhood allows users to ask, “What other genes in this data set act similarly to my gene of interest?”.

This activity has been a part of the Research Data Services funded A1.2 Life Sciences Project being led by QCIF, which Stemformatics undertook with help from the GVL Queensland team.

Stemformatics is funded from the University of Melbourne’s Centre for Stem Cell Systems and Stem Cells Australia, a special Australian Research Council initiative.