Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Data storage and retrieval with unnatural proteins expressed via E. coli.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Data storage using proteins offers high capacity and stability, enabling utilization of protein techniques for data storage and retrieval. However, expressing unnatural proteins with random sequences for data storage and sequencing them for accurate data retrieval remain challenging. In this study, by encoding digital data into amino acid sequences and incorporating them into collagen-like protein templates, we achieve successful expression of the proteins via E. coli for data storage; the data-bearing proteins containing selective amino acids and arginine intervals can be sequenced through tryptic digestion followed by LC-MS/MS analysis to achieve complete data recovery, even for protein mixtures encoding multiple datasets. We further demonstrate much higher stability of the data-bearing protein than DNA, and random access and cryptographic data protection using affinity-tagged proteins. This work establishes a robust framework for protein-based data storage, opening up avenues for data storage and retrieval, protein engineering and chemistry, synthetic biology, proteomics, and beyond.

Similar Papers
  • Dissertation
  • 10.14264/uql.2019.725
A distributed in-memory database system for large-scale spatial-temporal trajectory data
  • Aug 16, 2019
  • The University of Queensland
  • Douglas Alves Peixoto

Spatial-temporal trajectory data contains rich information about moving objects and phenomena, hence have been widely used for a great number of real-world applications. However, the ubiquity and complexity of spatial-temporal trajectory data has made it challenging to efficiently store, process, and query such data. Furthermore, the increasing number of users also challenges the ability of trajectory-based services and analytics to handle the query workload and response to multiple requests in a satisfactory time.Over the last few years, a new class of systems has emerged to handle large amounts of data in an efficient manner, referred as distributed in-memory database systems. These systems were designed to overcome the difficulties to scale traditional structured and unstructured data loads that some applications must handle. Spark has become the framework of choice for large-scale low-latency data processing using distributed in-memory computation. However, Spark-based systems still lack the ability to handle several trajectory database tasks in a memory-wise manner. Some desirable feature of trajectory database systems include, data preparation and pre-processing, large-scale data storage and retrieval, and multi-user concurrent query processing. Providing a full-fledged system architecture supporting these features is challenging, and yet an issue.Therefore, driven by the increasing interest in scalable and efficient systems for trajectory-based analytics, we propose a distributed in-memory database system for memory-wise storage and scalable processing of spatial-temporal trajectory data, with low query latency and high throughput. We build our system on top of the Spark MapReduce framework, which provides an in-memory and fault-tolerant environment for distributed parallel processing of large-scale data. Existing works on spatial data in MapReduce, however, either lack support for spatial-temporal trajectory data, or only provide disk-based storage with costly I/O, which negatively affects query performance. Furthermore, none of the state-of-the-art applications address the problem of memory-wise utilization, which is the main drawback of in-memory based frameworks such as Spark. In this thesis we propose new features to the Spark framework, in order to provide native support for spatial-temporal trajectory data, with low latency, high throughput, and memory-wise storage.Our architecture follows a complete framework for trajectory data storage and processing, with trajectory data preparation, data pre-processing, data storage, and concurrent query processing. Firstly, we provide a novel model for trajectory data representation, and a system for loading, parsing, integration, and compression of trajectory data. Secondly, we introduce a novel framework for trajectory pre-processing using map-matching on top of Spark, in order to achieve data quality by means of data cleaning and simplification. Finally, we introduce two novel approaches for data storage and multi-user trajectory query processing on top of Spark. In the first approach, we proposed a novel partitioning and storage methods focused on distance-based queries; in addition, we provide a system for trajectory distance measures evaluation, due to the extensive number of techniques available. In the second approach, we propose a novel memory-wise and workload-aware system for trajectory data storage, focused on data retrieval and spatial-temporal queries over large scale trajectory data; a key feature of our system is the ability to identify query hotspots, and exchange data between main-memory and disk based on the query workload, yet leveraging the scalability, fault-tolerance, efficiency, and concurrency control features of Spark.Although the efforts of current techniques provide a good starting point for trajectory data management on top of Spark, they are unable to provide all the features of our work. The superiority of our architecture comes from the research and development of both novel and state-of-the-art techniques for trajectory data management, using a well-established framework for large-scale data applications. We believe our system will support scientists and professionals working with large-scale trajectory-based applications.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icoac.2013.6921986
EDSRPPC: An efficient data storage and retrieval through personalization and prediction in cloud
  • Dec 1, 2013
  • P Varalakshmi + 4 more

In cloud computing, remote based massive and scalable data storage services are provided to the users. Data storage and retrieval services need to be modernized for secure communication in the cloud. So, in this paper, user centric personalization and prediction based searching techniques have been proposed to improve the data storage and retrieval services in cloud. Personalization is the concept of categorizing the data based on the user needs and prediction, where the relevant files needed by a user are retrieved based on the user requirement. Personalization is achieved by categorizing the user's files storage. The middleware is used as a support to the data owner, to perform all the necessary encryption and hashing required for secure storage. The data in the cloud server is partitioned into clusters based on the category and similarity. When a user requests for a file, the middleware receives a set of similar files from the cloud server from which the most relevant file is predicted and returned to the user based on the search log and access rights of the user by boosting and bagging techniques. The performance of the EDSRPPC system is measured based on the rate of relevancy of the returned files and also the time required for retrieval of the files from the cloud storage. The experimental result proves that EDSRPPC system as an efficient personalization and prediction framework for data storage and retrieval.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/iwcmc48107.2020.9148227
An Efficient Hadoop-Based Framework for Data Storage and Fault Recovering in Large-Scale Multimedia Sensor Networks
  • Jun 1, 2020
  • Ghina Saad + 4 more

In today's time, we live in the big data era where every event and thing about us is monitored and registered for a later analysis. In addition, such amount of big data is collected with a large speed (velocity) and does not fit a fixed structure (unstructured type). One of the most contributors of big data in this era is wireless multimedia sensor network (WMSN). Typically, WMSN consists of a set of sensors that collect three types of data about a zone of interest: numerical, images and videos. Indeed, the big data collected in WMSN along with the density deployment of network, especially in large-scale zones, provide real challenges for the end users in terms of data storage and processing. In this paper, we propose an efficient and robust Hadoop-based framework for big data collection, processing and storage in WMSN. The proposed framework relies on Hadoop ecosystem tools and introduces two fault detection algorithms (moving average and exponential smoothing) in order to preprocess data before storage. Through real sensor data with various types, we show the effectiveness of our framework in terms of processing storage speed and regenerating of missing data.

  • Research Article
  • Cite Count Icon 9
  • 10.1126/sciadv.adw2613
Photonic microspheres for high-capacity DNA data storage: Robust, straightforward, and scalable random access via nonfading indexes
  • Jun 20, 2025
  • Science Advances
  • Qiu-Jun Liu + 7 more

DNA-based storage offers a compelling alternative to traditional optical and magnetic devices. However, random data access usually requires additional noncoding primer DNA as indexes, which substantially reduce the physical data density. Here, we propose an alternative strategy to overcome this barrier by loading different data-encoding DNA files into porous microspheres, each distinguished by unique photonic bandgaps and diameters, allowing for 105 types of indexing. With the two features as the addressing indexes, the physical separation of subsets from a diverse pool of DNA files is achieved, thereby facilitating the selective retrieval of stored data. The interconnected nanopore arrays and uniformly distributed positive charges within the microspheres enhance DNA enrichment, achieving a storage density of up to 22.6 exabytes per gram, far exceeding that of previous random access storage methods. This work outlines a simple and scalable method for creating nonfading photonic indexes, enabling long-term random access while maintaining high storage capacity.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-981-10-4603-2_17
Hierarchical Metadata-Based Secure Data Retrieval Technique for Healthcare Application
  • Oct 25, 2017
  • Sayantani Saha + 2 more

The metadata representation in the context of data privacy is one of the biggest challenges in data security. The segregation of the data attributes for secure data retrieval as well as data storage is the primary motivation for this work. The metadata-based data retrieval is performed for providing a layer of protection on the actual data set and also for efficiently searching the location of encrypted data. The paper aims to provide a technique for representing the metadata of the encrypted e-health data for a secure data retrieval process. The hierarchical representation of metadata helps in retrieving the data in an efficient way so that access to the sensitive information could be controlled. The paper proposes a novel technique for metadata design for secure data retrieval. E-health data is fragmented over multiple servers based on sensitive attribute and sensitive association. A brief overview of the data protection and data retrieval techniques with respect to the proposed metadata representation is also presented.

  • Research Article
  • 10.1021/acsami.5c25216
Toward Portable DNA Data Storage: A Paper-Based System for Rewritable and Random Access.
  • Apr 13, 2026
  • ACS applied materials & interfaces
  • Qian Liu + 4 more

DNA offers immense archival potential due to its high density and longevity. However, practical use faces challenges, including biochemical constraints limiting random or repetitive data access and dependence on bulky lab equipment, which hinders portability. This work proposes a low-cost, high-density, and portable cellulose paper platform for DNA data storage. DNA pools containing distinct files were immobilized on two cellulose paper sheets for array preservation, with physical addresses assigned to enable random data access. Subsequently, through a constant-temperature (room temperature) amplification reaction, in situ random access to the data was achieved, eliminating reliance on thermocyclers. Meanwhile, the original DNA pool was recovered by using the magnetic bead method, enabling three repeated data writes and reads on the surface of cellulose paper. This approach harnesses cellulose paper for practical and portable DNA data archiving, presenting a promising solution to the problem of massive data storage.

  • Conference Article
  • 10.1109/wcse.2009.69
Construction and Application of SOTER-LUCC Database in Fujian Province
  • Jan 1, 2009
  • Zhi-Qiang Chen + 1 more

Soil and Terrain Digital Database (SOTER) recognizes the regional land according to terrain, rock, and soil providing a comprehensive framework for the data storage and retrieval. Currently SOTER has been authorized and become one of hotspots in the studies of soil and sustainable use of land. Combining SOTER and LUCC, taking SOTER units as the frame of LUCC information, constructing SOTER-LUCC database will provide more abundant and reliable data sources and can be used for a wide range of applications at different scales. Backed by the technology of “3S”, the study builds the SOTER-LUCC database in a moderate scale in Fujian province which has been applied in the assessment of the relative sensitivity of ecosystem to acid deposition and the suitable sites selection for Euphoria Longan in Fujian province and turned out to be feasible.

  • Research Article
  • Cite Count Icon 238
  • 10.1093/bioinformatics/15.6.510
An ontology for bioinformatics applications.
  • Jun 1, 1999
  • Bioinformatics
  • P G Baker + 5 more

An ontology of biological terminology provides a model of biological concepts that can be used to form a semantic framework for many data storage, retrieval and analysis tasks. Such a semantic framework could be used to underpin a range of important bioinformatics tasks, such as the querying of heterogeneous bioinformatics sources or the systematic annotation of experimental results. This paper provides an overview of an ontology [the Transparent Access to Multiple Biological Information Sources (TAMBIS) ontology or TaO] that describes a wide range of bioinformatics concepts. The present paper describes the mechanisms used for delivering the ontology and discusses the ontology's design and organization, which are crucial for maintaining the coherence of a large collection of concepts and their relationships. The TAMBIS system, which uses a subset of the TaO described here, is accessible over the Web via http://img.cs.man.ac.uk/tambis (although in the first instance, we will use a password mechanism to limit the load on our server). The complete model is also available on the Web at the above URL.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-981-13-1165-9_35
Reduction of Digital Forensic Evidence Using Data Science
  • Sep 29, 2018
  • Devesh Kumar Srivastava

The hasty headway in the field of information technology has lead ways for an escalating crime rate being technically exhaustive. The crimes involving digital tools and devices assist to be the forensic evidences. An upsurge in digital evidences is coalesced with the growing size of storage devices. Pertaining to the ineffectualness of the traditional analysis methods to handle the colossal amount of digital data, the forensic investigators have to adopt big data analytics to store, recover, and analyze the digital evidence. The storage of digital evidence calls for surveillance and security, thereby preserving its evidential significance. The digital analysis and fraud detection make the recovery and storage of digital data achievable by effective data reduction and exploiting the features of data mining for storage and data archive. Advancement with the forensic analysis assures automated management of digital data thus safeguarding the sensitivity of data. The paper aims to take the facets of data reduction for efficient storage and retrieval of digital data, and an overall digital forensic research framework has been outlined. The proposed work supports the existing framework for data reduction and storage. It also outlines the challenges and the unaddressed aspects of digital forensics. In this paper, I also discussed the unaddressed aspects of forensic investigations and peaks into the loopholes and the opportunity realms that can lay groundwork for future.

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.ifacol.2018.09.318
A Platform for Fault Diagnosis of High-Speed Train based on Big Data
  • Jan 1, 2018
  • IFAC-PapersOnLine
  • Quan Xu + 7 more

A Platform for Fault Diagnosis of High-Speed Train based on Big Data

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.jisa.2023.103482
Designing attribute-based verifiable data storage and retrieval scheme in cloud computing environment
  • Apr 17, 2023
  • Journal of Information Security and Applications
  • Sourav Bera + 4 more

Designing attribute-based verifiable data storage and retrieval scheme in cloud computing environment

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-540-79996-2_6
Knowledge Aquisition and Data Storage in Mobile GeoSensor Networks
  • Jan 1, 2008
  • Peggy Agouris + 3 more

In this paper we address the issue of mobility in geosensor networks, inspired by the computational challenges imposed by modern surveillance applications. More specifically we consider networks of optical sensors (video and still cameras), and present a spatiotemporal framework for the management of information captured in them. In this context, mobility is addressed at two levels, considering mobile objects in the area monitored by a network, and mobile sensors observing such objects. Our interest lies on the data acquisition and storage problems that arise in this setting. We identify certain key issues behind the development of a general framework for knowledge acquisition and data storage in geosensor networks, namely: spatiotemporal object modeling; similarity metrics to compare spatiotemporal objects; storing and indexing spatiotemporal objects in a geosensor network; and network management using spatiotemporal techniques. We present some emerging approaches that address these key issues and thus outline a general framework for information and sensor management in mobile sensor networks.KeywordsMobilitysurveillancemodelingspatiotemporal similarityindexing

  • Book Chapter
  • Cite Count Icon 7
  • 10.1007/978-981-15-2777-7_27
A Blockchain-Based Trustable Framework for IoT Data Storage and Access
  • Dec 23, 2019
  • Jiangfeng Li + 3 more

As the use of Internet of Things (IoT) devices is increasing dramatically, it is necessary to provide a trustable framework when IoT data are stored and used inside or outside IoT devices. Blockchains and smart contracts provide solutions to construct trustable environments in data storage and access. Unfortunately, current blockchain technology only suits for situations that a small or medium amount of data is stored and used, but it has a very low performance when a large amount of data gathered by IoT devices needs to be stored and accessed. This paper proposes a three-layer blockchain-based trustable framework for IoT data storage and access. In the framework, users, roles, permissions, data objects, and their relationships are formally defined. Based on these definitions, smart contracts with role-based access control (RBAC) model are developed. Additionally, a snapshot mechanism is designed to collect IoT data in order of time stamps and put it into files stored in the inter-planetary file system (IPFS). We developed a prototype of supply chain tracing system on Ethereum and IPFS for feasibility verification and performance evaluation of the proposed framework. The framework not only guarantees data integrity in storing IoT data, but also ensures data confidentiality when the IoT data is used. Moreover, simulation results illustrate that the prototype system has high performances in time, space, and gas consumption.

  • Research Article
  • Cite Count Icon 4
  • 10.4018/ijitwe.2021070103
A Robust Lightweight Data Security Model for Cloud Data Access and Storage
  • Jul 1, 2021
  • International Journal of Information Technology and Web Engineering
  • Pajany M + 1 more

In this paper, an efficient lightweight cloud-based data security model (LCDS) is proposed for building a secured cloud database with the assistance of intelligent rules, data storage, information collection, and security techniques. The major intention of this study is to introduce a new encryption algorithm to secure intellectual data, proposing a new data aggregation algorithm for effective data storage and improved security, developing an intelligent data merging algorithm for accessing encrypted and original datasets. The major benefit of the proposed model is that it is fast in the encryption process at the time of data storage and reduced decryption time during data retrieval. In this work, the authors proposed an enhanced version of the hybrid crypto algorithm (HCA) for cloud data access and storage. The proposed system provides secured storage for storing data within the cloud.

  • Research Article
  • 10.5075/epfl-thesis-6796
Integrating Gene Synthesis and MITOMI for Rapid Protein Engineering
  • Jan 1, 2016
  • Infoscience (Ecole Polytechnique Fédérale de Lausanne)
  • Matthew Blackburn

The research described within this thesis is primarily motivated by two fields of science with reversed, yet complementary, approaches to addressing the same essential objective: manipulating and understanding the complex program of life. Whereas synthetic biology has to do with the bottom-up construction of living systems from a library of ‘parts’, systems biology takes a top-down, reverse-engineering analysis of the same processes within functioning cells to elucidate the crucial elements and interactions. Both perspectives have led to important contributions in learning how biological systems operate. Discoveries from these studies can have far-reaching implications, notably in medicine, manufacturing, energy, and the environment. One exciting outcome of the research efforts within synthetic biology is the ability to design genetic constructs encoding proteins with novel, optimized functions. Although advances have been made in predicting how a protein’s amino acid sequence relates to its higher order structural form and behavior, experimental methods for protein engineering remain invaluable for evaluating computationally derived designs. Unfortunately, the design-build-test cycle for rational protein development is commonly a time, labor and resource intensive endeavor. This is primarily due to limitations in the ability to generate libraries of gene variants and bottlenecks in cell-based expression and quantitative screening of protein products. This thesis describes the development of a solid-phase gene assembly technique and its integration with microfluidic protein analysis to accelerate the prototyping of novel protein designs. The gene assembly method utilizes commodity-scale, chemically manufactured DNA, and operates without the need for ligase or restriction enzymes. As a bench top process amenable to scale up, the modularity of the assembly process allows the efficient and economical generation of expression-ready gene variants. We directly coupled this gene assembly pipeline to a microfluidic device to enable high-throughput, on-chip, cell-free protein expression, purification, and characterization. By circumventing molecular cloning and cell-based steps, the lag time between protein design and quantitative analysis was dramatically reduced. As a proof of concept, this protein engineering platform was applied towards the construction of over 400 artificial, engineered variants of C2H2 zinc finger (ZF) proteins. ZF protein domains are the most prevalent form of transcription factor (TF) in humans, and are found throughout the tree of life. They provide a convenient structure for refactoring due to their relatively small size and composability, and are an ideal model for exploring the biophysics of TF-DNA specificity. The ability to engineer ZF proteins makes them useful as programmable, DNA-targeting units for applications in biotechnology and synthetic biology. We demonstrate that although ZFPs can be readily engineered to recognize a particular DNA target, engineering the precise binding energy landscape remains a challenge. Additionally, we show that ZF-DNA binding affinity can be tuned independently of sequence specificity. Together, these results demonstrate the versatility of the coupled gene assembly and microfluidic analysis platform as a new tool for rational protein engineering.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant