Abstract

Big data analytics traditionally involves download of massive amounts of datasets to common server/cluster for processing. Analytic process gets slower with increasing size of required data and network conditions. Data scientists also need explicit access to data locations to download required data. Explicit access to required data may not always be granted due to security reasons. To simplify and accelerate the analytics process on distributed big data with security considerations, we proposed the Virtual Information Fabric Infrastructure (VIFI) for data driven discoveries. Instead of moving large amounts of data to a common place of processing, VIFI allows automatic transfer of required analytics programs to the distributed data locations for in-place processing of relevant data. VIFI allows data scientists to conduct and coordinate complex analytics processes on distributed data repositories using containerization technology and open-source workflow design tools. VIFI alleviates users from having detailed knowledge of distributed data locations, as well as required dependencies, installation and configuration of analytical libraries. In this paper, we demonstrate our current and future work to improve the VIFI architecture using previous and additional uses cases, data management layer that simplifies search of relevant data sets through addition of metadata, integration with security policies at different institutions with the proposed VIFI security layer, and the use of a user-friendly web interface to carry different VIFI activities.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.