The field of bioinformatics has come into full view recently, primarily because of the significant advances made by the Human Genome Project and other systematic sequencing projects, and the necessity for all biologists to be able to apply—at some level—these techniques to their own research. It may come as a surprise to most readers that the origins of the field of bioinformatics go well back into the 1960s, with the pioneering work performed by Margaret Dayhoff and her colleagues, who looked at a then limited number of protein sequences. The work performed by Dayhoff and her colleagues set the stage for the field as we know it today. Bioinformatics occupies a unique niche amongst the sciences, lying at the intersection of biology, genetics, biochemistry, computer science, mathematics, statistics, and numerous other allied fields. The inherent strength of the field of bioinformatics comes from the relationships between investigators in these allied fields; collaborations between these individuals has led to (and will continue to lead to) the development of novel methods and approaches, furthering advances in each of these areas. Such collaborations also set the stage for the piloting of experiments on computers, followed by the verification of the computational results in the laboratory. The central role of bioinformatics has been highlighted by numerous studies, including one by the Biomedical Information Science and Technology Inititiative (BISTI; http://www.nih.gov/about/director/060399.htm). This task force underscored the importance of bioinformatics support and education and its critical role in the advancement of modern science; without bioinformatics-based techniques, the scientific community would not be able to extract, view, or analyze the data being generated by any type of large-scale study, whether it be at the genomic, transcriptomic, or proteomic level. It becomes quite apparent that, regardless of the area of expertise of any given biologist, a firm grasp of basic bioinformatic techniques will become an essential—and indispensable—part of the “scientific arsenal” in tackling biological problems from now on. Current Protocols in Bioinformatics is designed to provide the experimentalist with insight into the types of data and protocols required to perform basic tasks in the area of bioinformatics. More importantly, it provides insight into understanding and properly interpreting the data produced by these methods. The Current Protocols series is known for its fast and timely publication of valuable and cutting-edge methods. The topics described below reflect the planned content for the first year's worth of installments. One of the most important things that the Editors and individual authors contributing to this work can do is to drive home the importance of manually inspecting the data produced by these methods—even though a particular method may produce a result, the method may not actually be biologically relevant or make any sort of sense in the context of the experiment being performed. There is never any substitute for manual inspection of results, with sophisticated users keeping their “biology hat” on as they peruse the results provided by the computer. The overall organization of Current Protocols in Bioinformatics is the product of a significant amount of discussion between the Editors, who have brought to bear their own individual experience from both research and teaching in how to best convey a logical, workflow-based path throughout the various concepts presented herein. Current Protocols in Bioinformatics begins with a discussion of the most commonly used sources of public data, giving the reader an appreciation for the types of questions that can be answered using publicly available databases (Chapter 1). With this as a basis, the book then marches through the major topics within the field of bioinformatics. First, the reader is introduced to methods allowing for the recognition of functional domains (Chapter 2), both at the nucleotide and protein level. These concepts are expanded upon in the following chapter, devoted to similarity searching and the inference of homology, providing the reader useful information regarding the differences between the types of available search algorithms and the reasons for finding homologs (Chapter 3). One of the major goals of the Human Genome Project is to identify all genes within the genome, and Chapter 4 is devoted to methods on this front, as well as to gene-finding strategies and cautions. Moving up in complexity, Chapter 5 covers topics related to molecular modeling, including methods such as homology model building and visualization of molecular models. Chapter 6 invokes the interrelationships between proteins from an evolutionary standpoint, providing the reader with an understanding of the concepts behind both conservation and evolution of function within the cell. Chapters 7 and 8 provide the reader with an appreciation for the interrelatedness of molecular processes; in Chapter 7, this is presented from the standpoint of gene expression and the analysis of gene expression patterns, while in Chapter 8 it is presented from the standpoint of intermolecular interactions. Since so much of bioinformatics and computational biology is dependent upon databases, a thorough treatment of the construction of databases is included (Chapter 9). While this may seem outside the scope of what some biologists would do themselves, more and more biologists are actively involved in the creation of databases for the warehousing of data generated by their own laboratories. Chapter 10 deals with comparing large sequence-based data sets. In the near future, this volume will include three additional chapters. One will focus on assembling sequences. A separate chapter will cover the computations behind the applications of mass spectrometry to relevant biological questions in proteomics. Finally, we will address the techniques that can be used at the RNA level, methods that are unfortunately often overlooked. Each chapter in this work represents a general subject area, with individual protocols contained in units within each chapter. In general, each unit describes a method and includes one or more protocols. Each protocol provides information on required resources, steps and annotations, data interpretation, and commentaries on the “hows” and “whys” of the method. In addition, each chapter has an overview unit, providing a broad perspective on the general subject area, as well as any theoretical discussion that the reader will need as a foundation for the material covered in the individual units within that chapter. Since this field is Web-intensive, links to useful resources are provided in each unit. Since this publication is, first and foremost, a compilation of techniques in bioinformatics, explanatory information aimed at giving the reader an intuitive grasp of the procedures is included. As stated above, chapters begin with overview units that provide biological context for the procedures that follow in that chapter. Each unit contains an Introduction that describes how the protocols that follow connect to one another, and annotations within the protocol itself describe the particulars of each step in the method. Where relevant, the unit authors have provided sample data that the reader can use to reproduce the output presented in their unit. Readers are strongly encouraged to make use of these data sets (found on the Current Protocols Web site http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm), both from the standpoint of understanding how to structure their own raw data, as well as to gain first-hand experience with the methods themselves. As one can imagine, none of this material is of any use in the absence of an explanation of how one should interpret the output from any given method. Each protocol-based unit provides a separate section on Guidelines for Understanding Results. The individual authors, experts in their respective fields, have taken great care to provide the user with a basic understanding of how to interpret their results. In some cases, examples of bad or misleading results are also given, thereby helping the reader develop a critical perspective on the use of these methods. Finally, each protocol-based unit closes with a Commentary, giving background information regarding the historical and theoretical development of the method, as well as alternative approaches, the importance of critical parameters used in the protocol, and different approaches that could accomplish the same end. All units contain references to the primary literature, which the user is encouraged to read to gain a better appreciation for the methods described in the protocols. Many units in Current Protocols in Bioinformatics contain groups of protocols, each presented as a discrete series of steps. The Basic Protocol, presented first in each unit, is the generally recommended or most universally applicable approach. Alternate Protocols are provided where variations on the Basic Protocol can be employed to achieve similar ends, or where requirements for the end result vary from those for the Basic Protocol. Support Protocols describe additional steps that are required to perform the Basic or Alternate Protocols and that stand alone as “subroutines.” A series of appendices is provided, with information on concepts that are applicable across the individual chapters and units. These appendices include examples of common file formats, the interconversion between common file formats, basic Unix commands, and the use of X-Windows. In order to remain accessible to the typical biologist, a strong emphasis has been placed on Web-based solutions. In many cases, though, a Unix-based method may be described, either because it is the only type of solution available, or because it provides distinct and significant advantages over any available Web-based version of the same program. Most of the protocols included in this manual are used by our own research groups as a routine part of our everyday work. As such, we have learned many of the intricacies of the programs, and have made an effort to share this information with the readers of Current Protocols in Bioinformatics. Critical steps and parameters are annotated where this is appropriate, providing the reader with a “troubleshooting guide” as well as an insight into “tricks of the trade.” The successful evolution of this manual into a resource that meets the needs of its readership depends not only upon the perspective and expertise of our colleagues, but upon the observations, experiences, and suggestions of our readership. A reader-response survey can be found on the Current Protocols in Bioinformatics Web page http://www.currentprotocols.com, and we strongly encourage our readers to use this survey to provide us with their constructive comments. There are many individuals whom we must thank, without whose efforts this work would not have become a reality. First and foremost, our thanks go to all of the authors whose individual contributions make up this work. The expertise and professional viewpoints that these individuals bring to bear go a long way in making this work's content as strong as it is. We also thank our Senior Editor, Ann Boyle, as well as our Developmental Editor, Shonda Leonard, for their wisdom, patience, and support in helping to shape Current Protocols in Bioinformatics into a strong, valuable resource for the biological community. We are fortunate to have them on our team, and look forward to continuing our work with them as this work continues to grow and evolve. Other skilled members of the Current Protocols staff who contributed to the success of this project include Scott Holmes, Tom Cannon Jr., Michael Gates, and Joseph White. The extensive copyediting required to produce an accurate protocols manual was ably handled by Allen Ranz, Tom Downey, and Susan Lieberman.