The identification of genes by DNA sequence analysis is a formally unspecified pattern recognition problem. Genes are identified in practice by constructing and evaluating models that represent the spatial relations between a number of components that can be identified by pattern matching. This is currently done interactively, with the aid of a variety of pattern matching and statistical analysis tools. gm1 automates gene identification by integrating the application of these tools with automated model generation. Models of genes are constructed by a task-specific algorithm implemented using the MGR architecture, a general automated problem solving architecture. The development of gm1 demonstrates the versatility of the MGR architecture as a tool for building automated systems for scientific data analysis.
Read full abstract