Diving Deeper into Big Data
This fall, associate professor of statistics Martin Slawski will launch a project focused on developing software for analyzing and linking disparate data sets in a way that will provide researchers and others with more effective options for mining Big Data.
The project, funded by the National Science Foundation’s Office of Advanced Cyber Infrastructure, provides Slawski, his partners at Brown and Johns Hopkins University, and their graduate students with $600,000 over the next three years to develop a new approach to combining data from different sources, like survey data and administrative data, into more comprehensive data sets using inexact identifiers such as names, demographics, addresses or even more ambiguous commonalities.
With some estimates placing the amount of data that humans create at around 2.5 quintillion bytes every day, the challenge of using the data is becoming vastly more challenging, but Slawski, whose background is in both statistics and computer science, has already applied his methods to the problem of identifying systematic biases in the criminal justice system by comparing data generated at different stages of the criminal justice process, and he is planning to use it to help the National Institutes of Health’s National Institute of Aging better understand the healthcare outcomes of programs like Meals on Wheels.
“Often you don’t have all the information you need in a single file, so rather than recollect that information, which is costly, the idea is to use the information you have and link it,” Slawski said.
A statistician and a computer scientist, Slawski brings expertise in both fields to his work in making data analysis more accurate in a way that will produce findings with greater validity and minimize security and privacy risks.
“Both fields have very different perspectives on data analysis, but when you bring them together, they can be very complementary,” Slawski said. “It’s important to see the problem from both angles.”