For every experimental compound in the drug discovery pipeline, pharmaceutical companies generate huge amounts of data: chemical and physical data derived from various analytical techniques, and biological data that flow out of large-scale screening programs as well as lead optimization work. Drug discovery efforts have been hampered not by a lack of lead compounds or a dearth of experimental data, but by the need for effective and efficient computational tools to collect, store, manipulate, and analyze large amounts of data. Scientists in many of the major pharmaceutical and biotechnology companies, including GlaxoSmithKline, Aventis, AstraZeneca, Hoffmann-La Roche, Merck, Novartis, Millennium, Exelixis, and Immunex, Cytokinetics, Evotec and Monsanto are using ActivityBase (Figure 1), an integrated data management system, to collect and analyze data generated by high throughput screening (HTS), to store chemical structures and register novel compounds, and to integrate cheminformatic and bioinformatic datasets. ActivityBase manages data produced by HTS (with sustained screening volumes exceeding 30,000 to 40,000 wells/working day) and ultra-HTS (sustained volumes of greater than 100,000 wells/working day) and some companies are populating ActivityBase databases at the rate of approximately 20 million data points per six months. Many operational databases exceed tens of millions of rows, and the software’s search engine can respond to typical queries in a matter of seconds. An abundance of data does not necessarily add value to an experimental compound. The data do not imply therapeutic efficacy, infer bioavailability, predict toxicity, or suggest drug-like properties. Successful discovery research depends on the ability to integrate diverse datasets from multiple sources and to extract information from raw data. It is this information that will guide and expedite decision-making, improve productivity, and add value. It is this information that will allow a company to decide whether to pursue a lead compound or to “fail” it early in the discovery process. ActivityBase is based on IDBS’ generic data model designed for discovery research and can capture, manage, and store data from biological, chemical, and robotic systems. The ActivityBase 5.0 Suite seamlessly integrates cheminformatic and bioinformatic data. It provides the framework for converting data into information that can be applied to lead discovery and optimization processes. (Figure 2) New functionalities introduced in version 5.0 enhance the flexibility of data collection and analysis and expand data integration capabilities. Joining AssayBase, which manages biological data are three new software modules: · StructureBase for registering chemical compounds and searching molecular structures and related physicochemical data. (Figure 3) · ReactionBase for storing, managing, and searching chemical reactions and reaction schemes. (Figure 4) · Natural Products for managing the process of isolating active compounds from natural materials; it generates a genealogic trail that tracks the derivation of new chemical compounds from natural products.