Research Impact

The researcher who tinkers and tailors with Crabspy

Summary

A James Cook University coastal ecology PhD student has developed a digital toolbox built in Python, created to “spy” on intertidal crabs in their natural environment.

Cesar Herrera Acosta used QRIScloud in his work to create 'Crabspy'. He also had support from QCIF's JCU-based eResearch Analyst and the helpers at JCU's Hacky Hour. 

Full article

26 February 2020

Species extinction rates are at an all-time high in modern times and natural scientists are faced with the challenge of needing to rapidly increase their efforts to gather reliable ecosystem information at broader scales in order to mitigate threats.
 
Traditional methods of collecting ecological data can often be time consuming, invasive and can alter the natural habitat of the study site.
 
With this in mind, a James Cook University coastal ecology PhD student has developed an alternative scientific workflow to collect biological and ecological data using computer vision and machine learning to scale up data collection to required levels and improve its efficiency and utility. To do so, he used a node on QRIScloud, QCIF’s cloud, specially designed for machine learning work.
 
Using intertidal crabs as his test case, Cesar Herrera Acosta developed Crabspy, a heuristic digital toolbox built in Python, created to “spy” on the crabs in their natural environment.
 
“By spying, I mean furtively collecting information about the functional biology and ecology of crabs, so things like species identity, movement patterns, change in colouration, feeding, bioturbation rates and more,” said Cesar.
 
Crabspy aims to accelerate and improve information collection, enabling rapid and actionable scientific and policy response by means of a faster data streamline.
 
It will benefit scientists working on a wide range of species (such as crabs, molluscs, ants and small reptiles) who are facing similar challenges using current sampling techniques, or those who are time-constrained in the field or laboratory.
 
Cesar also hopes Crabspy will inspire and encourage other ecologists to design fit-for-purpose software to improve data collection by increasing reproducibility and reducing observation bias.
 
“While Crabspy, along with similar computer vision and learning initiatives, has the potential to change the current sampling and observation techniques in crabs (and other species), the major impact would likely be in changing our current paradigm as new sampling techniques allow us to gain access to richer and bigger data sets, offering the opportunity to discover new patterns and develop new analytical means,” he said.

Crabspy and QRIScloud

To develop Crabspy, Cesar used QRIScloud for prototyping and testing off-the-shelf deep learning models to identify crab species and crab behaviour on images.
 
When his project began in late 2016, Cesar did not have local access to a high-performance computer with the graphics processing units (GPUs) he needed for machine learning.
 
Dr Collin Storlie, a JCU-based QCIF eResearch Analyst at the time, connected Cesar to a QRIScloud GPU node.
 
“This project would not have been feasible without QRIScloud,” said Cesar.
 
“Using the QRIScloud special computing instance with GPU, I was able to achieve my testing in two weeks,” he said.
 
“QRIScloud allowed me to rapidly prototype and train various off-the-shelf machine learning algorithms in the early stages of Crabspy. This permitted me to gain early insight about which algorithms were better suited to different tasks and also to evaluate the computing resources I would need to progress my project.
 
“In addition to the benefits for my project, using QRIScloud was equally important as an encouraging and motivating personal experience. As an ecologist with limited experience in software development and computing sciences, one of the main benefits of using the special compute GPU node from QRIScloud was the friendly and technical advice from both the [QCIF] JCU-based eResearch Analyst and the QRIScloud Help Desk. It was fantastic to receive support from knowledgeable analysts willing to enable innovative ideas and enhance user expertise.”
 
Although GPU nodes are not currently available in QRIScloud, researchers can access this technology either via their institutional HPC offerings or commercial cloud services. Researchers can work with their local QCIF eResearch Analyst for access. 

Hacky Hour

Cesar also benefited from the regular Hacky Hour Dr Collin Storlie once hosted at JCU’s Townsville campus.
 
It was there that Collin encouraged Cesar to learn Python (a new programming language to the PhD student) and develop his own solution to his research problems.
 
“Hacky Hour was such a great initiative for bouncing and discussing ideas with others and developing my coding abilities,” said Cesar.
 
Hacky Hour at JCU returned in February 2020, hosted by QCIF’s new JCU-based eResearch Analyst, Chantelle Pinnington.

The future of Crabspy

Crabspy is in continuous development, with innovations combining fit-for-purpose and off-the-shelf software.
 
Cesar readily acknowledges the toolbox has room for improvement, largely to do with code hygiene, such as tidy code, efficiency, reducing redundancy, including unit tests and improving documentation.
 
“I hope computing enthusiasts and researchers feel attracted to collaborate in this initiative, not simply to improve this toolbox but to create research synergy, combining efforts to accomplish novel multidisciplinary and interdisciplinary research,” he said.
 
A manuscript describing Crabspy and its scientific workflow is currently being prepared and will soon be available in open access biology preprint bioRxiv.
 
Cesar’s research was supported by his Australian Government Research Training Program (RTP) Scholarship. His project was funded by a Holsworth Wildlife Research Endowment from the Equity Trustees Charitable Foundation, the Ecological Society of Australia, and a JCU College of Science and Engineering joint research training grant.

In Brief:
 
Researcher:
Cesar Herrera Acosta
PhD student
Science Integrated Coastal Ecosystem Management
James Cook University

Research community:
Marine ecology

Resources used: 

  • QRIScloud (total allocation):
    • GPU node: 5,157 CPU hours
  • JCU eResearch HPRC (JCU’s HPC)
    • Compute: 2 vCPUs, 16 GB
    • Storage: 3 TB.