eResearch Tips

Got spreadsheet or Web-scraped data you need to clean up or transform?

By Nick Hamilton, QCIF eResearch Analyst at The University of Queensland 

Perhaps you have collected survey data where people have given multiple responses that mean the same thing, or may have responded multiple times.

Or you have scraped data from multiple sources on the Web and the formatting is inconsistent.

Then OpenRefine may help.

OpenRefine is a powerful, easy-to-use, free tool for importing, cleaning, merging and transforming badly formatted data into something beautiful you can actually use! 

For common data cleaning tasks, OpenRefine maintains lists of easy examples and recipes to perform actions such as removing duplicate rows, removing unwanted punctuation, reformatting dates, finding bad data, scraping and formatting data from the Web, or applying multiple filters to your data to extract and explore the rows that have properties you are interested in. 

At its simplest, OpenRefine may be used via a relatively intuitive user interface. For the more advanced there are powerful tools such as programming language access and regular expressions for sophisticated search, substitution and data transformation. 

If you get stuck, there are friendly user forums.

I highly recommended OpenRefine.