Abstract
ObjectiveThe Welsh Longitudinal General Practice (WLGP) dataset contains over 4 billion records. Due to the size and the need to filter the dataset to get the necessary general practice interaction results, query performance is poor. To overcome this, a ‘cleaned’ version of the dataset needed to be created. ApproachAn R package was created that would be run when a new refresh of the WLGP data is provided. Two new tables would be created based on the original WLGP Events table. ResultsA WLGP Cleaned Dataset R Package was produced and creates a reformatted events table and events look up table. The reformatted GP events table reduces the original events table from 14 columns to 8 key columns. The events look up table takes distinct events from the original WLGP events table and produces a new table including columns such as the event description, type and hierarchy levels. Both tables are then linked using an event code ID. ConclusionThe R package now creates a new reformatted events table as well as an events look up table which is run after a WLGP refresh is provided. The new tables are then provisioned to any projects that have access to the dataset. ImplicationsThe new tables has improved query performance for our researchers, providing them with a table with all the necessary information and an easy way to decipher events codes. This has improved the amount of time researchers have spent manipulating and querying the tables.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.