Abstract

A syntax-correcting CIF parser, COD::CIF::Parser, is presented that can parse CIF 1.1 files and accurately report the position and the nature of the discovered syntactic problems. In addition, the parser is able to automatically fix the most common and the most obvious syntactic deficiencies of the input files. Bindings for Perl, C and Python programming environments are available. Based on COD::CIF::Parser, the cod-tools package for manipulating the CIFs in the Crystallography Open Database (COD) has been developed. The cod-tools package has been successfully used for continuous updates of the data in the automated COD data deposition pipeline, and to check the validity of COD data against the IUCr data validation guidelines. The performance, capabilities and applications of different parsers are compared.

Highlights

  • Over the quarter century of its existence, the Crystallographic Information Framework (CIF, Crystallographic Information File; Hall et al, 1991) – a standard format for reporting and storing data pertaining to crystal structures – has been widely adopted as a standard for supplementary material by the International Union of Crystallography (IUCr) (Brown & McMahon, 2002) and has been used by the majority of crystallographic journals as well as structural databases [Inorganic Crystal Structure Database, Cambridge Structural Database, CRYSTMET and Crystallography Open Database (COD; http://www.crystallography.net/; Grazulis et al, 2012)]

  • The strict mode of COD::CIF::Parser ensures that all CIFs in the COD adhere to the CIF description provided by the IUCr

  • It was used to check every CIF stored in the COD Subversion repository for syntactic correctness

Read more

Summary

Introduction

Over the quarter century of its existence, the Crystallographic Information Framework (CIF, Crystallographic Information File; Hall et al, 1991) – a standard format for reporting and storing data pertaining to crystal structures – has been widely adopted as a standard for supplementary material by the International Union of Crystallography (IUCr) (Brown & McMahon, 2002) and has been used by the majority of crystallographic journals as well as structural databases [Inorganic Crystal Structure Database (http://www2.fiz-karlsruhe.de/ icsd_home.html; Belsky et al, 2002), Cambridge Structural Database (http://www.ccdc.cam.ac.uk/products/csd/; Groom & Allen, 2014), CRYSTMET (http://www.tothcanada.com/databases. htm; Le Page & Rodgers, 2005) and Crystallography Open Database (COD; http://www.crystallography.net/; Grazulis et al, 2012)]. Examples of general purpose CIF parsers include vcif (http://www.iucr.org/resources/cif/software/archived/vcif-1.2; McMahon, 2006b) and vcif ( known by the name of the executable file cif2cbf ; Todorov & Bernstein, 2008) in C, ucif (Gildea et al, 2011) in C++, cif2cif (Hall & Bernstein, 1996) in Fortran and PyCIFRW (Hester, 2006) in Python (van Rossum, 2003) Another noteworthy tool is the ZINC package (Stampf, 2004), which provides a set of converters from CIF to ZINC format and allows convenient manipulation of data in a command line environment. Since the syntax of CIF 1.1 is a subset of the more general STAR 1 (Hall & Spadaccini, 1994) format, STAR parsers like STAR::Parser (Bluhm, 2000) in Perl (Wall et al, 2000) and StarTools (Keller, 2013) in Java are capable of parsing CIFs

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call