LONG BEACH, CA - The era of big data is arriving in healthcare, and informatics will be the key to unlocking its potential, said Katherine Andriole, PhD, during the opening session of the Society for Imaging Informatics in Medicine (SIIM) annual meeting.
Informatics, data scientists, and engineers will have big roles to play in healthcare and biomedical research, according to Andriole.
"Basic research at all aspects of the imaging chain needs to be done in terms of how to search data, clean data, store data, and visualize data," she said. "The hope is that we can take that basic research and translate it into clinical practice, so that we can have high-quality, safe, and effective healthcare across the world."
Andriole spoke Thursday morning during the annual Sam Dwyer Lecture, which honors legendary PACS researcher Dr. Sam Dwyer. She is an associate director of radiology at Harvard Medical School and the SIIM 2014 program committee chair.
Big data offers the potential to use large databases to recognize or predict patterns in data, even if the meaning of the patterns isn't yet understood.
"That is the promise that big data has for us to show," she said.
Challenges with big data
There are obstacles that need to be overcome regarding big data in healthcare and biomedical research, including storage, biomedical access, security and privacy, and visualization.
While storage has become cheaper, the amount of information is still growing. There is also poor access to data for biomedical research purposes, she said. Security and privacy, as well as being able to visualize the data, also need to be improved.
In the future, big data management will utilize cloud storage and virtual computing, along with, one hopes, continued computing increases in input/output and instructions per second, Andriole said.
"Most of our computing will probably be distributed, and will have better databases for searching and better computing platforms," she said.
Current big data management platforms include Apache Hadoop, an open-source, cross-platform Java-based framework for storage and large-scale processing of data on clusters of commodity hardware. Andriole also pointed to a platform being developed at Boston Children's Hospital called Substitutable Medical Apps and Reusable Technology (SMART). SMART provides a standard language and flexible information infrastructure aimed at facilitates innovation, and apps can be developed for it.
A SMART-enabled server knows how to get, reconcile, and aggregate data that has been put into the system.
"The goal here is to transform healthcare into a data-driven enterprise much more than it is today," she said.
Data issues in healthcare and biomedical research include concerns over data integrity, curation, normalization (i.e., reconciling patient identities), analysis, and visualization.
In healthcare, there can be missing/sparse data, unstructured data, multiscale data, complex data, longitudinal data, and noisy data, she said.
"It's going to be a very good time for informaticists and data scientists," she noted. "We will be needed going forward."
Business analytics
While work has been done in retrospective data analytics in healthcare, the future will see more direct interventions and predictive powers using analytics, Andriole said. A tremendous amount of work needs to be done in image processing and analysis.
"We've been talking a lot about metadata and analyzing that, but we need to get the pixel data content and be able to search that," she said. "Right now, this is a huge problem; how do we tag features in images, and how do we make the information ... in radiology quantitative?"
Research activities are underway internationally to investigate using different kinds of algorithms and machine/signal processing in these areas.
"There is a huge effort in quantitative imaging, which is looking at big data and trying to connect genetic data, patient history, labs, social history, and imaging data," she said. "To create algorithms to work on this data, people need to be able to share large amounts to be able to test their algorithms and see if they're getting the same answers with their algorithms on that data."
Quantitative measures are also being explored via international collaborative activities, and this is an area radiology needs to focus on going forward, Andriole said.
Lots of information is contained in pixel data that could be exploited for decision support, including computer-aided detection (CAD), alerts, and evidence-based patient management. It could also be used for patient personalization in diagnosis and treatment protocols.
Visualization techniques (i.e., presenting data results in more effective formats) are also important for uncovering the message in the data, she said.
Healthcare big data
Healthcare big data includes structured electronic health record data, unstructured clinical notes and reports, medical imaging data, genetic data, behavioral and social data, epidemiological data, and evidence-based practice data.
In healthcare, big data allows samples of data to be collected; these samples can be analyzed and stored, patients can be treated, and the data can then be refined, continuing the cycle and accelerating the process, Andriole said.
"The promise of healthcare is safety and high quality of care for everyone, efficiency and cost-effective, and predictive analytics," she said. "We will see more and more moving to personalized medicine and, in fact, quantitative or precision medicine and doing evidence-based decision support at the point of care."
Some great areas of research to tackle include data and information access, data normalization, and integration with context sensitivity, particularly in the clinical environment, she said.
Biomedical research
The Enhancing Neuroimaging Genetics Through Meta-Analysis (ENIGMA) consortium is an example of the power of big-data mining in biomedical research. An international network encompassing 287 scientists across 125 institutions and 28 countries, ENIGMA is bringing together researchers in imaging genomics to better understand brain structure, function, and disease based on brain imaging and genetic data.
ENIGMA provides a large amount of shared datasets and machine learning and search tools to interrogate worldwide data, using computing resources and big data to accelerate discovery, according to Andriole.
She closed her talk by encouraging SIIM attendees to be like the late Sam Dwyer.
"Work across disciplines such as engineering, informatics, physics, and biomedicine," she said.
Also, it's important to understand the application environment and engage with industry partners and collaborate to solve the problems.
"Think of the possibilities, and enjoy the journey," Andriole said.