Using Big Data Meaningfully to Improve Quality and Safety
By Hardeep Singh,MD, MPH; and Dean F. Sittig, PhD
It’s rare to attend a conference on quality, safety, or informatics without feeling the excitement of “big data,” a loose term referring to large volumes of interconnected (and often unverified) data that may be updated and processed rapidly. Big data is often defined by the 5 V’s:
- Volume: The sheer amount of data to be processed
- Velocity: The rate at which the data must be processed to keep up with the inflow
- Variety: The diversity of data sources, types, dimensions, and time scales that must be managed
- Veracity: The data’s level of accuracy and trustworthiness
- Value: The significance, or the meaning, of the data and the potential relationships it represents, which is perhaps the most important V of big data
With wide-scale implementation of electronic health records (EHR), interconnectivity between different systems of care, and new patient-based sensor technology, the prospect of using big data to discover new relationships is real. For quality and safety researchers, using big data for surveillance of missed opportunities is a dream come true (Murphy et al., 2014). In this article, we discuss some of the challenges that need to be addressed in order to leverage big data to improve quality and safety at the point of care.
Promise of big data
The promise of big data is the ability to identify significant events and nascent risks by combining data (e.g., the record of events, diagnoses, test results, and treatments) gathered through multiple sources and methods, often across disparate organizations. This massive merging of data sources and types may reveal a longitudinal picture of a patient, illuminating important trends or gaps in care. However, sharing data across organizations and ensuring that the information is accurate remains a challenge. In particular, matching data from the same patient across organizations is difficult and susceptible to errors (Patient identification, 2014).
Further, data are often collected haphazardly, and methods to encode or map similar clinical concepts from one standard vocabulary to another have shortcomings. Even when researchers are able to bring the data together, it is often difficult to understand what really happened to the patient. There are some exemplars in this area though, including Kaiser Permanente Health Care and the Department of Veterans Affairs (VA). For example, by developing its Corporate Data Warehouse (CDW), VA is systematically refining common data definitions and adding essential data elements to develop a more comprehensive picture of its patients (Fihn et al., 2014).
Data are being generated and stored at an unprecedented rate as a by-product of myriad digital transactions, such as order entry, admission, discharge, transfer, and procedure recording. Additionally, patients themselves are generating data, sometimes in conjunction with new omnipresent monitoring technologies. Paired with this outpouring of data is an enthusiasm for discovering relationships and using them to inform forecasts.
Challenges of big data
While expanding access to data has real promise, this tsunami of information will also create unintended consequences. Our work has shown that almost a third of providers currently miss abnormal test results in their EHRs due to information overload (Singh, Spitzmueller, Petersen, Sawhney, & Sittig, 2013). These new information sources, if not carefully managed, are almost certain to add to clinicians’ information processing burden. Thus, those who develop and deploy big data–based discoveries and solutions should make sure that the information delivery fits within the workflow of the recipient and is delivered nonintrusively. Much of this information should be distributed to members of the health care team other than frontline clinicians, such as care managers, quality and safety personnel, or even new staff roles dedicated to handling this data.
Another challenge will be to ensure that the use of big data actually improves quality and safety, the patient and clinician experience, and efficiency. Since these value-driven aims of big data analytics go well beyond the original purposes of the data, distinguishing signal from noise is essential. Dedicated analytics teams should include highly trained mathematicians, computer scientists, and informaticians supported by both frontline clinicians and quality and safety administrators; these supporting staff members can help ensure that the information gleaned from the data is correct, actionable, and able to be delivered to the right person, at the right time, in the right format. Teams must take care to avoid bias, confounding factors, and spurious associations in their attempts to identify meaningful relationships and assign causation. Retrospective, observational study designs have significant limitations, especially for determining causation or even identifying preventable events. Therefore, predictive models based on previously collected data should be tested prospectively whenever possible.
The final challenge is operationalizing a regulatory, financing, and policy framework to optimize the use of big data for quality and safety improvement. Managing the trade-off between individual privacy rights and the potential societal benefits from this research continues to be a challenge.
While use of big data in healthcare has potential to improve the quality, safety, and efficiency of patient care, much work remains to unlock this potential. This work must be supported with dedicated funding, new types of personnel and information governance structures, and a robust, high-capacity information technology infrastructure.
Hardeep Singh is chief of the Health Policy, Quality & Informatics Program at the VA Health Services Research Center of Innovation based at the Michael E. DeBakey VA Medical Center and Baylor College of Medicine, Houston. He is on Twitter @HardeepSinghMD and may be contacted at hardeeps@bcm.edu.
Dean F. Sittig is the Christopher Sarofim Family Professor of Biomedical Informatics and Engineering at theUniversity of Texas Health Science Center at Houston’s School of Biomedical Informatics and the UT-Memorial Hermann Center for Healthcare Quality & Safety in Houston. He is on Twitter @DeanSittig and may be contacted at Dean.F.Sittig@uth.tmc.edu.
The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.
References
Fihn, S. D., Francis, J., Clancy, C., Nielson, C., Nelson, K., Rumsfeld, J. … Graham, G. L. (2014). Insights from advanced analytics at the Veterans Health Administration. Health Affairs (Millwood), 33(7), 1203–1211. doi: 10.1377/hlthaff.2014.0054
Murphy, D. R., Laxmisan, A., Reis, B. A., Thomas, E. J., Esquivel, A., Forjuoh, A. N., … Singh, H. (2014, January). Electronic health record-based triggers to detect potential delays in cancer diagnosis. BMJ Quality & Safety, 23(1). doi: 10.1136/bmjqs-2013-00187
Patient identification. (2014, February 24). Retrieved from https://www.healthit.gov/safer/guide/sg006
Singh, H., Spitzmueller, C., Petersen, N. J., Sawhney, M. K., & Sittig, D. F. (2013). Information overload and missed test results in electronic health record-based settings. JAMA Internal Medicine, 173(8), 702–704. doi: 10.1001/2013.jamainternmed.61
Note: This article is updated and adapted from “Using Big Data Meaningfully to Improve Quality and Safety at the Point of Care” by Hardeep Singh, MD, MPH; and Dean Sittig, PhD, Veterans Affairs Health Services Research and Development Service Forum: Research Highlights, October 2014.