Abstract

Background/Aims: Integrating data across systems can be a daunting process. The traditional method of moving data to a common location, mapping fields with different formats and meanings, and performing data cleaning activities to ensure valid and reliable integration across systems can be both expensive and extremely time consuming. As the scope of needed research data increases, the traditional methodology may not be sustainable. Data Virtualization provides an alternative to traditional methods that may reduce the effort required to integrate data across disparate systems. Objective: Our goal was to survey new methods in data integration, cloud computing, enterprise data management and virtual data management for opportunities to increase the efficiency of producing VDW and similar data sets. Methods: Kaiser Permanente Information Technology (KPIT), in collaboration with the Mid-Atlantic Permanente Research Institute (MAPRI) reviewed methodologies in the burgeoning field of Data Virtualization. We identified potential strengths and weaknesses of new approaches to data integration. For each method, we evaluated its potential application for producing effective research data sets. Results: Data Virtualization provides opportunities to reduce the amount of data movement required to integrate data sources on different platforms in order to produce research data sets. Additionally, Data Virtualization also includes methods for managing “fuzzy” matching used to match fields known to have poor reliability such as names, addresses and social security numbers. These methods could improve the efficiency of integrating state and federal data such as patient race, death, and tumors with internal electronic health record data. Discussion: The emerging field of Data Virtualization has considerable potential for increasing the efficiency of producing research data sets. An important next step will be to develop a proof of concept project that will help us understand to benefits

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call