Abstract
The Texas Advanced Computing Center and the Institute for Classical Archaeology at the University of Texas at Austin developed a method that uses iRods rules and a Jython script to automate the extraction of metadata from digital archaeological data. The first step was to create a record-keeping system to classify the data. The record-keeping system employs file and directory hierarchy naming conventions designed specifically to maintain the relationship between the data objects and map the archaeological documentation process. The metadata implicit in the record-keeping system is automatically extracted upon ingest, combined with additional sources of metadata, and stored alongside the data in the iRods preservation environment. This method enables a more organized workflow for the researchers, helps them archive their data close to the moment of data creation, and avoids error prone manual metadata input. We describe the types of metadata extracted and provide technical details of the extraction process and storage of the data and metadata.
Highlights
The Institute of Classic Archaeology (ICA), a research unit at the University of Texas at Austin, has been conducting various overseas archaeological projects involving specialists in several countries
After describing how the information implicit in this system is mapped to appropriate metadata schemas, we provide an overview of the task workflow for extracting that metadata, in addition to extracting metadata from additional sources using iRods rules and a Jython script
We provide logic to remove the associated Metadata Encoding and Transmission Standard document (METS) document as well, with iRods handling the clearing of the iRods Catalog (iCat) metadata for us
Summary
An ongoing collaboration between TACC and ICA to manage and archive ICA’s evolving data collection resulted in a method to automate the capture of metadata from the collection’s record-keeping system during ingestion into a preservation and management environment This metadata provides the necessary contextual and versioning information to render a history of the research process (Esteva et al, 2010). While ARK is used to manage the research data and its public presentation, it does not address the long-term preservation of the raw data which is extensively used for detailed analysis and for publication It is in ARK where object descriptions and context relationships are recorded, but the raw data objects themselves are represented only by an identification number and versions of the objects in lower quality data formats. ICA researchers stressed the need for ensuring the long-term preservation of the raw data, their descriptions, and the relationships between associated data objects
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.