Abstract
As many personal genomes are being sequenced, collaborative analysis of those genomes has become essential. However, analysis of personal genomic data raises important privacy and confidentiality issues. We propose a methodology for federated analysis of sequence variants from personal genomes. Specific base-pair positions and/or regions are queried for samples to which the user has access but also for the whole population. The statistics results do not breach data confidentiality but allow further exploration of the data; researchers can negotiate access to relevant samples through pseudonymous identifiers. This approach minimizes the impact on data confidentiality while enabling powerful data analysis by gaining access to important rare samples. Our methodology is implemented in an open source tool called NGS-Logistics, freely available at https://ngsl.esat.kuleuven.be.
Highlights
Next-generation sequencing (NGS) is a key tool in genomics, in particular to study inherited and acquired human genetic disorders [1]
All functionalities of NGS-Logistics will be illustrated by querying one gene. Another example will demonstrate how NGS-Logistics can help in interpreting variants
Use case one We use the example of SMARCA2, a gene located on chromosome 9 whose heterozygous mutation causes Nicolaides-Baraitser syndrome [17]
Summary
Next-generation sequencing (NGS) is a key tool in genomics, in particular to study inherited and acquired human genetic disorders [1]. A potential solution to manage access to personal genomic data is the use of access control lists (ACLs). Implementation The NGS-Logistics application package consists of three modules: 1) administration; 2) query manager; 3) primary user interface. Data sets can be private or public (Note that pseudonymization does not stop the data from being personal because the genome or exome sequence is unique to each individual) It uses the ‘PI Name’ and ‘Sample Type’ fields for recognition of the type of sample (Research or Diagnostics) and assigns them to the owner of the sample (PI Name). Since samples can be aligned to different builds of the human genome, the system will use this information to aggregate query results. Since the process really depends on the facilities and workload of each center, we cannot estimate the duration of each process but display the progress by center
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.