Computational Biology Research Research Articles

BackgroundAltered networks of gene regulation underlie many complex conditions, including cancer. Inferring gene regulatory networks from high-throughput microarray expression data is a fundamental but challenging task in computational systems biology and its translation to genomic medicine. Although diverse computational and statistical approaches have been brought to bear on the gene regulatory network inference problem, their relative strengths and disadvantages remain poorly understood, largely because comparative analyses usually consider only small subsets of methods, use only synthetic data, and/or fail to adopt a common measure of inference quality.MethodsWe report a comprehensive comparative evaluation of nine state-of-the art gene regulatory network inference methods encompassing the main algorithmic approaches (mutual information, correlation, partial correlation, random forests, support vector machines) using 38 simulated datasets and empirical serous papillary ovarian adenocarcinoma expression-microarray data. We then apply the best-performing method to infer normal and cancer networks. We assess the druggability of the proteins encoded by our predicted target genes using the CancerResource and PharmGKB webtools and databases.ResultsWe observe large differences in the accuracy with which these methods predict the underlying gene regulatory network depending on features of the data, network size, topology, experiment type, and parameter settings. Applying the best-performing method (the supervised method SIRENE) to the serous papillary ovarian adenocarcinoma dataset, we infer and rank regulatory interactions, some previously reported and others novel. For selected novel interactions we propose testable mechanistic models linking gene regulation to cancer. Using network analysis and visualization, we uncover cross-regulation of angiogenesis-specific genes through three key transcription factors in normal and cancer conditions. Druggabilty analysis of proteins encoded by the 10 highest-confidence target genes, and by 15 genes with differential regulation in normal and cancer conditions, reveals 75% to be potential drug targets.ConclusionsOur study represents a concrete application of gene regulatory network inference to ovarian cancer, demonstrating the complete cycle of computational systems biology research, from genome-scale data analysis via network inference, evaluation of methods, to the generation of novel testable hypotheses, their prioritization for experimental validation, and discovery of potential drug targets.

Journal of Computational BiologyVol. 18, No. 9 RECOMB - Comparative Genomics 2010Guest Editor: Eric TannierPrefacePreface: Satellite Workshop on Comparative Genomics, Research in Computational Molecular Biology (RECOMB-CG 2010)Published Online:7 Sep 2011https://doi.org/10.1089/cmb.2011.008pAboutSectionsView articleView Full TextPDF/EPUB ToolsPermissionsDownload CitationsTrack CitationsAdd to favorites Back To Publication ShareShare onFacebookTwitterLinked InRedditEmail View articleFiguresReferencesRelatedDetails Volume 18Issue 9Sep 2011 InformationCopyright 2011, Mary Ann Liebert, Inc.To cite this article:Preface: Satellite Workshop on Comparative Genomics, Research in Computational Molecular Biology (RECOMB-CG 2010).Journal of Computational Biology.Sep 2011.1019-1021.http://doi.org/10.1089/cmb.2011.008pPublished in Volume: 18 Issue 9: September 7, 2011PDF download

Journal of Computational BiologyVol. 18, No. 3 PrefacePreface: 14th International Conference on Research in Computational Molecular Biology (RECOMB 2010)Bonnie BergerBonnie BergerSearch for more papers by this authorPublished Online:8 Mar 2011https://doi.org/10.1089/cmb.2010.006pAboutSectionsView articleView Full TextPDF/EPUB ToolsPermissionsDownload CitationsTrack CitationsAdd to favorites Back To Publication ShareShare onFacebookTwitterLinked InRedditEmail View article"Preface: 14th International Conference on Research in Computational Molecular Biology (RECOMB 2010)." , 18(3), p. 205FiguresReferencesRelatedDetails Volume 18Issue 3Mar 2011 InformationCopyright 2011, Mary Ann Liebert, Inc.To cite this article:Bonnie Berger.Preface: 14th International Conference on Research in Computational Molecular Biology (RECOMB 2010).Journal of Computational Biology.Mar 2011.205-205.http://doi.org/10.1089/cmb.2010.006pPublished in Volume: 18 Issue 3: March 8, 2011PDF download

Nat. Biotechnol. 28, 935–942 (2010); published online 09 September 2010; corrected after print 7 December 2010 In the version of this article initially published, the affiliation for Ken Fukuda was incorrect. The correct affiliation is Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.

For most of his career, Terrence Sejnowski, a professor of computational neuroscience at the Salk Institute for Biological Studies and a Howard Hughes Medical Institute investigator, has peered at the brain with pin-sharp precision. By using simulations to make sense of experimental data, Sejnowski has helped link biophysical processes in the brain to human behavior. His research has revealed insights into a raft of phenomena from vision to sleep to brain disorders. They could lead to practical benefits: Bestriding the fields of computational biology, neuroscience, psychology, and education, Sejnowski and other researchers hope to usher the age of machine learning into the real world. Sejnowski tells PNAS how using machines to model and emulate human behavior could make a difference in our lives. Terrence J. Sejnowski. PNAS: How did you become interested in machine learning? Sejnowski: One of the most challenging questions in neuroscience is how social behaviors emerge from brain processes underlying sensation, emotions, language, memory, and cognition. When we first set out to address this challenge, it occurred to us that one way by which physicists figured out phenomena like gravity and aerodynamics was by building devices that exploited those phenomena. So, we needed to build machines that work like the brain by using software and computer chips that would form circuits capable of interacting with humans through social signals. In collaboration with Paul Ekman, an expert on reading facial expressions, our goal was to make machines capable of interpreting facial expressions so that, someday, social robots could communicate with humans on their own terms. PNAS: And where would we use these social robots? Sejnowski: Javier Movellan, a computational neuroscientist at the University of California, San Diego’s Institute for Neural Computation, has built a social robot he calls Rubi that interacts with toddlers who are just beginning to learn language. One of the challenges for preschool teachers is classroom control; the kids are running all over the place, so it’s difficult for the lone teacher to help kids focus. Rubi engaged the kids, encouraged dialogue, and facilitated learning. So, the idea is to use robots as teaching assistants. But it’s still early days. PNAS: How do you make robots emulate human social learning? Sejnowski: The first step is to get the child to accept the robot as a learning partner rather than as a toy. By using mathematical theory and demonstration, Javier showed that the most crucial variable for interacting with humans is response time. If a robot does not respond to a child’s question within a certain time window, the child loses interest. Also, a child will look at an object to which a teacher is pointing, so robots should be capable of shared attention, another hallmark of human learning. Robots must also be capable of other important features of human learning, such as empathy and imitation, which come from recognizing human emotions. But again, it’s early days. PNAS: All this smacks of artificial intelligence. Sejnowski: This is very different from traditional approaches in artificial intelligence, where the goal is to create a cognitive machine that creates a model of the world and computes responses based on that model. That’s not how the brain generates behavior. With its limited capacity, the brain selects only the most important sensory inputs to process and the most effective responses to store. Thanks to its capacity for learning and memory, the brain is able to interact in a social way with relatively low bandwidth, which is partly what makes social robots feasible. By emulating biological intelligence, machine learning is heralding a new era. PNAS: To many, a robot in the classroom is the stuff of science fiction. How do you convince policymakers that the investment is worth the payoff? Sejnowski: First of all, the cat’s already out of the bag. It’s now a question of optimizing the technology for our own benefit. For example, social robots can serve as personal cognitive enhancers. Second, the idea would not be to replace teachers but to provide them with assistants. Besides helping teachers to hold toddlers’ attention in the classroom, social robots can stand in when teachers need to be briefly absent. Robots can help relieve teachers of some of their mundane duties so that teachers can serve as role models and tailor attention to individual students. That said, we can’t predict the full impact of these transformative technologies. PNAS: Fair enough. So where’s the rub? Sejnowski: It’s mainly in the resources. We’ve made sufficient progress in neuroscience and engineering to be able to overcome technical challenges to using machines in social contexts. But we need to scale up lab experiments, clearly calling for a major investment of resources. If we had a thousand Rubis, we could accelerate research and reduce costs. The other problem is societal. Will our institutions be able to adapt to the new environments that such endeavors will help create? That’s an open question. PNAS: How will the new environment help children improve their cognitive skills? Sejnowski: There’s a lot of emphasis on classroom learning of subjects like language, mathematics, and science, but to improve learning, we also need an emphasis on acquiring basic cognitive skills like attention, listening, and memory. We have evidence that social robots can help improve attention. Paula Tallal, codirector of the Temporal Dynamics of Learning Center (TDLC) in San Diego, has developed software already being used in classrooms across the country that can help children who have difficulties listening and hence, understanding language. Hal Pashler, also at TDLC, has studied a well-known phenomenon in memory research—the spacing effect—to find the optimal intervals for refreshing memory to help children retain learned material for many years. These are just a couple of examples of wide-ranging research in neuroeducation, a field dedicated to helping children become better learners. PNAS: Your own work in the mid-1990s shed surprising light on reinforcement learning. Sejnowski: We developed a computational model of the brain’s dopamine system, involved in reward-based learning, to understand how the dopamine neurons learn to make predictions about future rewards. This computational model has been confirmed in a wide range of settings using brain imaging in humans. As they learn new facts about the world, children use the dopamine system as a guide to finding the best sequence of steps to solve problems and to reach a goal. We are just beginning to understand how the different learning systems in the brain work together to produce the astonishing range of behaviors humans are capable of.

Bioinformatics emerged about 50 years ago, but it was developed greatly during the early 1980s by robust databases such as GenBank, EMBL, and DNA Database of Japan (DDBJ). Bioinformatic routines were rapidly adapted once the main algorithms for sequence analysis became available worldwide. As in other science fields, bioinformatics had minimal impact in low-income countries of Latin America until the last decade. We revised the bioinformatics state of art in Colombia and found a few bioinformatics groups carrying out basic computational biology research. Nowadays, bioinformatics in Colombia has a hopeful scenario thanks to recent science policies adopted by the Colombian Government. Such policies have been adopted in order to establish a new model of sustainable scientific research. In this brief report we revise the bioinformatics state of the art in Colombia. Finally, we conclude with some considerations for the proposed science model and we describe different perspectives of interest for the Colombian scientific community.

This special issue collects five papers that were presented at the Mini EURO Conference on Computational Biology, Bioinformatics and Medicine organized by the EURO Working Group on Operational Research in Computational Biology, Bioinformatics and Medicine (EURO-CBBM). The Conference was held in Rome on September 2008 and included presentations of 61 talks selected on the basis of extended abstracts and three invited talks. The objective of the conference was to bring together researchers developing and using state-of-the art modeling and optimization approaches to solve problems in computational biology, bioinformatics and medicine. The conference program was planned with the objective of establishing an effective forum for the exchange and for the discussion of current research, issues and future trends in the above mentioned areas. The meeting was structured into 15 sessions on Protein Structure Analysis, Protein Structure Prediction, Structural Bioinformatics, Motif Recognition, Image Analysis, Dynamic Models, Networks, Data Mining Methods, DNA Sequencing, Optimization and Feature Extraction Methods and Microarray Analysis. Out of 61 presentations, 11 were evaluated for publication in the special issue. After two rounds of reviewing process, five papers were selected for publication. The first paper is “A Logic-Based Approach to Polymer Sequence Analysis” by R. Bruni. This paper presents an improved propositional logic based approach for the estimation of protein sequences from mass spectroscopy results. The correspondence

Future Medicinal ChemistryVol. 2, No. 6 EditorialFree AccessRole of open chemical data in aiding drug discovery and designAnna Gaulton and John P OveringtonAnna Gaulton† Author for correspondenceEMBL – European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. Search for more papers by this authorEmail the corresponding author at anna.gaulton@ebi.ac.uk and John P OveringtonEMBL – European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UKSearch for more papers by this authorPublished Online:14 Jun 2010https://doi.org/10.4155/fmc.10.191AboutSectionsPDF/EPUB ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareShare onFacebookTwitterLinkedInRedditEmail Drug-discovery dataResearchers in large pharmaceutical companies typically draw on a wide range of data resources and tools to enable decisions regarding target selection, lead identification, optimization and candidate selection. Much of this information is either generated internally or licensed from commercial vendors. For example, access to large sets of screening results, patent databases and databases of clinical candidates can be used to identify chemical tools or leads for a target of interest or to assess competitive position. Additional data classes add incrementally to this view. For example, internally generated crystal structures, complexed with drug-like ligands, provide valuable information for structure-based drug design and lead optimization. Large numbers of absorption, distribution, metabolism and excretion (ADME) and toxicity measurements also allow the building of predictive models to prioritize compounds, select the best candidates for further development and attempt to minimize the risks of potential adverse effects. By contrast, academic researchers have typically had to rely on a far smaller number of available public-domain resources, together with information scattered across the literature. Access to large chemical and pharmacological datasets has previously been limited, in part due to concerns about the potential loss of intellectual property associated with disclosing compound structures.Public data & precompetitive initiativesIn recent years, however, there has been a significant increase in the availability of large-scale open data for drug discovery. In particular, the number and size of screening databases has expanded significantly. The establishment of initiatives such as the NIH Molecular Libraries Program [1] and the Broad Institute’s Chemical Biology Platform are making access to high-throughput screening (HTS) capabilities and the subsequent primary data more widely available to academic groups. These data are typically fed into public databases such as PubChem [2] and ChemBank [3]. Initial plans are also underway for a similar European infrastructure project (EU-openscreen), whose aim will be to connect a network of screening centres across Europe and provide access to the results via a common European chemical biology database [101]. In addition, the recent transfer of the ChEMBL database from the private sector into the public domain [102] will supplement existing activity databases such as BindingDB [4], IUPHARDB [5] and PDSP Ki [103]. Publishers could also play a part in this data-accessibility process by setting policies for deposition of screening data into public repositories (as is currently the case for sequence and protein structure data) and helping to standardize the way such data are reported. Nature Chemical Biology, for example, has already produced guidelines for the submission of screening data [6]. In the area of toxicity, several public screening initiatives are also underway, including the EPA ToxCast [7,8] and Tox21 [104] projects. While these efforts are primarily focused around environmental chemicals, the resulting data may still be informative in a drug-discovery context.In addition to screening and bioactivity information, there are also now an increasing number of large chemical structure repositories, providing access to tens of millions of compounds for applications such as virtual screening [9] (e.g., PubChem, Zinc [10] and GDB-13 [11]). Several other public domain databases containing drug discovery-relevant information are also being developed – for example, DrugBank [12] and DailyMed [105] provide information regarding approved drugs, ClinicalTrials.gov [106] provides data on clinical-stage experimental drugs and DSSTox [13,14] and TOXNET [15] collate toxicity information from a wide range of public sources.The increasing availability of public data coincides with initiatives in the pharmaceutical industry aimed at reducing costs, for example via increased outsourcing and engaging in precompetitive activities. The establishment of the Pistoia Alliance (a not-for-profit consortium of pharmaceutical companies, institutes and technology vendors, established for the purpose of brokering common precompetitive needs [16]) and the European Innovative Medicines Initiative [17] are both helping to provide a driving force towards further development and integration of tools and databases within the public domain. Public–private partnerships, such as the Structural Genomics Consortium-led chemical probes initiative [18], are becoming increasingly common and, further to this, pharmaceutical companies are starting to release some of their own formerly proprietary data. GlaxoSmithKline, for instance, has recently announced that it will make a large dataset of 13,500 compounds with antimalarial activity publicly available [107]. It is expected that other companies will follow this lead.The impact of open dataThe availability of public large-scale datasets is likely to have a significant impact on academic, not-for-profit and industrial drug discovery. First, groups will be enabled with access to the data they need for individual projects, for example rapid identification of high-quality tool compounds to help validate targets or profile disease models. Second, and perhaps more important, the datasets will encourage the development of new tools and predictive algorithms within the public domain, benefiting the widest possible community. A parallel to this can perhaps be seen when considering the vast array of bioinformatics tools and methods developed for functional annotation of proteins following the exponential growth in deposition of sequence and structure data since the early 1990s. A similar explosion and investment of funding in chemoinformatics and computational chemical biology research may help address many of the unmet needs in drug discovery and design. For example, databases of launched drugs and medicinal chemistry compounds could be data mined to discover key properties and rules related to successful drugs or to identify possible lead-optimization strategies and tactics. Large bioactivity datasets can be used to derive panels of quantitative structure–activity relationship or classification models, allowing prediction of compound activity from structure. Such predictions can contribute to the elucidation of the molecular targets of phenotypic assays, prediction or explanation of drug side effects and identification of potential drug repurposing opportunities through optimization of alternative activities. Identification of new leads may also be accomplished through the application of structure-based virtual-screening methods such as docking and pharmacophore- or molecular similarity-based methods.However, with all predictive methods, the quality and relevance of the training data are paramount in determining the accuracy and applicability domain of resulting models. HTS results are often uncurated and typically have a relatively high false-positive rate, for example. Dose response studies in published literature do not always adequately report negative results. Chemical structures may often be depicted or named incorrectly. As datasets become more readily available, we will see the emphasis move towards quality, in addition to indexing and organization of data, rather than raw quantity. Indeed, many analyses are already being published that assess the quality of public screening libraries and identify promiscuous or reactive compounds that could be responsible for many of the false-positive results [19,20] or investigate the accuracy of compound structures in various repositories [21]. Progress within just this one area will have a profound impact on improving the discovery rate of genuinely useful chemical probes as a starting point for the development of novel and safe therapeutics. With the increasingly rapid growth of these public-domain sources, ensuring quality and interoperability is going to pose significant challenges.Accompanying the growth of open data and associated research activities, we are also starting to see increasing growth in the availability of open-source tools for chemical data processing and analysis. For example, toolkits and workflow tools such as CDK/Taverna [22], Bioclipse [23], RDKit [108], KNIME [24] and OpenBabel [109] are gaining in popularity, allowing scientists to tap into the increasing number of available resources and facilitating data-mining efforts, without needing investment in expensive commercial software – this mirrors projects such as BioPerl for the bioinformatics research community. Similarly, efforts are underway to better integrate disparate chemical and drug-discovery data sources [25,26] and improve interoperability through the development of standards (e.g., the use of the InChI representation for chemical structures [27]). Further emphasis in this area will be essential to promote maximal utility of the data.The changing face of drug discoveryPerhaps a logical extension of many of the developments discussed above is in acting as a catalyst for the collaboration of different groups and organizations on the actual process of drug discovery. While in most areas this poses questions around retention of intellectual property, several collaborative efforts are already underway in the area of neglected disease research. Not-for-profit organizations such as the Medicines for Malaria Venture and the Drugs for Neglected Diseases initiative have already been established for this purpose and a growing number of public collaborative drug-discovery resources are being established (e.g., the TDR Targets database [28] and The Synaptic Leap [110]).In order for collaborative and academic drug-discovery efforts to really succeed, however, researchers will need access to the full range of tools and data available to those in industry. While this is becoming increasingly possible, datasets in some areas are still lacking. There is still only a limited amount of public information regarding the ADME properties of compounds, for example [29]. Without such data and the development of good-quality ADME models, potential lead compounds may lack the properties required for good bioavailability in vivo and may subsequently fail in early development. The pharmaceutical industry has also invested much time and money into identifying and eliminating causes of toxicity but, again, much of this information is not publicly available, meaning mistakes of the past risk being repeated. Finally, a large body of chemical structure, synthesis and pharmacology information is contained only within patent documents. Though these documents are readily available online, they are not in a suitably structured form for large-scale searching and analysis. Some efforts are underway to facilitate indexing of these documents. For example, OSRA is an open-source tool for conversion of graphical representations of compounds in documents into computer-readable formats, allowing images in patents to be extracted and searched by structure [30]. However, the extraction of other valuable data from patent texts remains a nontrivial task. Arguably, tackling this data-accessibility gap within the public domain could result in huge benefits in productivity and efficiency.Future perspectiveFormerly, the billions of dollars spent annually on research within the pharmaceutical industry provided industrial researchers with unparalleled access to critical tools and resources that were largely beyond the reach of academics, not-for-profits and SMEs. However, it is now becoming clear that this business model of drug-discovery research and development is not sustainable or cost effective [31], and we are seeing the drug-discovery industry, together with data publishers and funding agencies, adopt new business models based on increased outsourcing, collaborative skills transfer and precompetitive activities [32,33]. Ultimately, as the volume and quality of open data increase, we are likely to see a growth in enabled academic and collaborative drug discovery. There is also likely to be an increase in the number of small biotechnology/pharmaceutical companies, accompanied by a decrease in the amount of research carried out within the closed walls of large pharmaceutical companies; this trend will depend crucially on facile access to enabling data. Hopefully, a benefit of this change in model will be greater levels of innovation and a boost to the dwindling productivity of the drug-discovery industry as a whole.AcknowledgementsThe authors wish to thank the Wellcome Trust for a Strategic Award and the EMBL-EBI for additional support. We are grateful to the referees of this paper for their suggestions and improvements.Financial & competing interests disclosureThe authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.No writing assistance was utilized in the production of this manuscript.Papers of special note have been highlighted as:▪ of interest ▪▪ of considerable interestBibliography1 Austin CP, Brady LS, Insel TR, Collins FS. NIH Molecular Libraries Initiative. Science306(5699),1138–1139 (2004).Crossref, Medline, CAS, Google Scholar2 Wang Y, Bolton E, Dracheva S et al. An overview of the PubChem BioAssay resource. Nucleic Acids Res.38(Database issue),D255–266 (2010).Crossref, Medline, CAS, Google Scholar3 Seiler KP, George GA, Happ MP et al. ChemBank: a small-molecule screening and cheminformatics resource database. Nucleic Acids Res.36(Database issue),D351–D359 (2008).Crossref, Medline, CAS, Google Scholar4 Liu T, Lin Y, Wen X, Jorrisen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res.35(Database issue),D198–201 (2007).Crossref, Medline, CAS, Google Scholar5 Harmar AJ, Hills RA, Rosser EM et al. IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels. Nucleic Acids Res.37(Database issue),D680–685 (2009).Crossref, Medline, CAS, Google Scholar6 Inglese J, Shamu C, Gu R. Reporting data from high-throughput screening of small-molecule libraries. Nat. Chem. Biol.3(8),438–441 (2007).▪▪ Important article calling for journals to enforce standards for the reporting of screening data.Crossref, Medline, CAS, Google Scholar7 Judson RS, Houck KA, Kavlock RJ et al.In vitro screening of environmental chemicals for targeted testing prioritization: the ToxCast project. Environ. Health Perspect.118(4),485–492 (2010).Crossref, Medline, CAS, Google Scholar8 Collins FS, Gray GM, Bucher JR. Toxicology. Transforming environmental health protection. Science319(5865),906–907.Crossref, Medline, Google Scholar9 Villoutreix BO, Renault N, Lagorce D, Sperandio O, Montes M, Miteva MA. Free resources to assist structure-based virtual ligand screening experiments. Curr. Protein Pept. Sci.8(4),381–411 (2007).Crossref, Medline, CAS, Google Scholar10 Irwin JJ, Shoichet BK. ZINC – a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model.45(1),177–182 (2005).Crossref, Medline, CAS, Google Scholar11 Blum LC, Reymond JL. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc.131(25),8732–8733 (2009).Crossref, Medline, CAS, Google Scholar12 Wishart DS, Knox C, Guo AC et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res.36(Database issue),D901–D906 (2008).Crossref, Medline, CAS, Google Scholar13 Richard AM. DSSTox website launch: improving public access to databases for building structure-toxicity prediction models. Preclinica2,103–108 (2004).CAS, Google Scholar14 Richard AM, Williams CR. Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutat. Res.499(1),27–52 (2002).Crossref, Medline, CAS, Google Scholar15 Hochstein C, Arnesen S, Goshorn J. Environmental health and toxicology resources of the United States National Library of Medicine. Med. Ref. Serv. Q.26(3),21–45 (2007)Crossref, Medline, Google Scholar16 Barnes MR, Harland L, Foord SM et al. Lowering industry firewalls: pre-competitive informatics initiatives in drug discovery. Nat. Rev. Drug Discov.8(9),701–708 (2009).▪ Important paper describing the aims of pharmaceutical companies in setting up precompetitive initiatives.Crossref, Medline, CAS, Google Scholar17 Hunter AJ. The Innovative Medicines Initiative: a pre-competitive initiative to enhance the biomedical science base of Europe to expedite the development of new medicines for patients. Drug Discov. Today13(9–10),371–373 (2008).Crossref, Medline, Google Scholar18 Edwards AM, Bountra C, Kerr DJ, Willson TM. Open access chemical and clinical probes to support drug discovery. Nat. Chem. Biol.5(7),436–440 (2009).▪ Details an important public–private partnership to develop freely available chemical probes for key targets.Crossref, Medline, CAS, Google Scholar19 Feng BY, Simeonov A, Jadhav A et al. A high-throughput screen for aggregation-based inhibition in a large compound library. J. Med. Chem.50(10),2385–2390 (2007).Crossref, Medline, CAS, Google Scholar20 Soares KM, Blackmon N, Shun TY et al. Profiling the NIH Small Molecule Repository for compounds that generate H2O2 by redox cycling in reducing environments. Assay Drug Dev. Technol. (2010) in press.Medline, Google Scholar21 Young D, Martin T, Venkatapathy R, Harten P. Are the chemical structures in your QSAR correct? QSAR Comb. Sci.27(11–12),1337–1345 (2008).▪ Informative article highlighting issues with data quality when building quantitative structure–activity relationship models.Crossref, CAS, Google Scholar22 Kuhn T, Willighagen EL, Zielesny A, Steinbeck C. CDK-Taverna: an open workflow environment for cheminformatics. BMC Bioinformatics11(1),159 (2010).Crossref, Medline, Google Scholar23 Spjuth O, Helmus T, Willighagen EL et al. Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinformatics8,59 (2007).Crossref, Medline, Google Scholar24 Berthold MR, Cebron N, Dill F et al. KNIME: The Konstanz Information Miner. In: Data Analysis, Machine Learning and Applications. Preisach C, Schmidt-Thieme L (Eds). Springer-Verlag, Berlin, 319–326 (2008).Google Scholar25 Jentzsch A, Hassanzadeh O, Bizer C, Andersson B, Stephens S. Enabling tailored therapeutics with linked data. Presented at: The 2nd Workshop about Linked Data on the Web. Madrid, Spain, 20 April 2009.Google Scholar26 Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J. Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform.41(5),706–716 (2008).Crossref, Medline, Google Scholar27 Heller SR, McNaught AD. The IUPAC international chemical identifier (InChI). Chem. Int.31(1),7 (2009).CAS, Google Scholar28 Agüero F, Al-Lazikani B, Aslett M et al. Genomic-scale prioritization of drug targets: the TDR targets database. Nat. Rev. Drug Discov.7(11),900–907 (2008).Crossref, Medline, CAS, Google Scholar29 Ekins S, Williams AJ. Precompetitive preclinical ADME/Tox data: set it free on the web to facilitate computational model building and assist drug development. Lab Chip10(1),13–22 (2010).▪ Thorough discussion of issues with the availability of absorption, distribution, metabolism, excretion and toxicity data in the public domain and the potential advantages of releasing such data.Crossref, Medline, CAS, Google Scholar30 Filippov IV, Nicklaus MC. Optical structure recognition software to recover chemical information: OSRA, an open source solution. J. Chem. Inf. Model.49(3),740–743 (2009).Crossref, Medline, CAS, Google Scholar31 Munos B. Lessons for 60 years of pharmaceutical innovation. Nat. Rev. Drug Discov.8(12),959–968 (2009).▪▪ Interesting and detailed analysis of trends in the productivity of the pharmaceutical industry throughout its historyCrossref, Medline, CAS, Google Scholar32 Melese T, Lin SM, Chang JL, Cohen NH. Open innovation networks between academia and industry: an imperative for breakthrough therapies. Nat. Med.15(5),502–507 (2009).Crossref, Medline, CAS, Google Scholar33 Munos BH, Chin WW. A call for sharing: adapting pharmaceutical research to new realities. Sci. Transl. Med.1(9),9 (2009).Crossref, Google Scholar101 EU OpenScreen. www.eu-openscreen.deGoogle Scholar102 Wellcome Trust press release www.wellcome.ac.uk/News/Media-office/Press-releases/2010/WTX058219.htmGoogle Scholar103 PDSP Database http://pdsp.med.unc.edu/pdsp.phpGoogle Scholar104 Tox21: Putting a lens on the vision of toxicity testing in the 21st Century www.alttox.org/ttrc/overarching-challenges/way-forward/austin-kavlock-ticeGoogle Scholar105 DailyMed http://dailymed.nlm.nih.gov/dailymed/about.cfmGoogle Scholar106 Clinical Trials homepage www.clinicaltrials.govGoogle Scholar107 GSK announces ‘open innovation’ strategy to help deliver new and better medicines for people living in the world’s poorest countries – press release www.gsk.com/media/pressreleases/2010/2010_pressrelease_10009.htmGoogle Scholar108 RDKit: cheminformatics and machine learning software www.rdkit.org/Google Scholar109 Open Babel: the open source toolbox http://openbabel.orgGoogle Scholar110 The Synaptic Leap Homepage www.thesynapticleap.orgGoogle ScholarFiguresReferencesRelatedDetailsCited ByDECIMER 1.0: deep learning for chemical image recognition using transformers17 August 2021 | Journal of Cheminformatics, Vol. 13, No. 1Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modelingDrug Discovery Today, Vol. 25, No. 9Molecular Structure Extraction from Documents Using Deep Learning13 February 2019 | Journal of Chemical Information and Modeling, Vol. 59, No. 3Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources11 February 2016 | F1000Research, Vol. 5Finding the right approach to big data-driven medicinal chemistryScott J Lusher & Tina Ritschel6 July 2015 | Future Medicinal Chemistry, Vol. 7, No. 10Markov Logic Networks for Optical Chemical Structure Recognition6 August 2014 | Journal of Chemical Information and Modeling, Vol. 54, No. 8Data-driven medicinal chemistry in the era of big dataDrug Discovery Today, Vol. 19, No. 7Open Innovation-Based Drug Discovery in Europe: Some Examples of National and Transnational European Initiatives Integrating Chemistry, Biology, and Technology Platforms4 April 2014Public Domain Databases for Medicinal Chemistry30 September 2013The promiscuous binding of pharmaceutical drugs and their transporter-mediated uptake into cells: what we (need to) know and how we can do soDrug Discovery Today, Vol. 18, No. 5-6Public Domain Databases for Medicinal Chemistry11 July 2012 | Journal of Medicinal Chemistry, Vol. 55, No. 16Taking Open Innovation to the Molecular Level - Strengths and Limitations7 August 2012 | Molecular Informatics, Vol. 31, No. 8Annotating Human P-Glycoprotein Bioassay Data7 August 2012 | Molecular Informatics, Vol. 31, No. 8Drug discovery in the age of systems biology: the rise of computational approaches for data integrationCurrent Opinion in Biotechnology, Vol. 23, No. 4TDR Targets: a chemogenomics resource for neglected diseases23 November 2011 | Nucleic Acids Research, Vol. 40, No. D1Collation and data-mining of literature bioactivity data for drug discovery21 September 2011 | Biochemical Society Transactions, Vol. 39, No. 5Missing Value Estimation for Compound-Target Activity Data8 October 2010 | Molecular Informatics, Vol. 29, No. 10 Vol. 2, No. 6 Follow us on social media for the latest updates Metrics History Published online 14 June 2010 Published in print June 2010 Information© Future Science LtdAcknowledgementsThe authors wish to thank the Wellcome Trust for a Strategic Award and the EMBL-EBI for additional support. We are grateful to the referees of this paper for their suggestions and improvements.Financial & competing interests disclosureThe authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.No writing assistance was utilized in the production of this manuscript.PDF download

Future Medicinal Chemistry

Jun 1, 2010
Anna Gaulton

BackgroundAn important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community.ResultsSCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences).ConclusionsBesides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at http://www.paccanarolab.org/software/scps.

Nuclear magnetic resonance (NMR) spectroscopy is an important experimental technique that allows one to study protein structure and dynamics in solution. An important bottleneck in NMR protein structure determination is the assignment of NMR peaks to the corresponding nuclei. Structure-based assignment (SBA) aims to solve this problem with the help of a template protein which is homologous to the target and has applications in the study of structure-activity relationship, protein-protein and protein-ligand interactions. We formulate SBA as a linear assignment problem with additional nuclear overhauser effect constraints, which can be solved within nuclear vector replacement's (NVR) framework (Langmead, C., Yan, A., Lilien, R., Wang, L. and Donald, B. (2003) A Polynomial-Time Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments. Proc. the 7th Annual Int. Conf. Research in Computational Molecular Biology (RECOMB), Berlin, Germany, April 10-13, pp. 176-187. ACM Press, New York, NY. J. Comp. Bio., (2004), 11, pp. 277-298; Langmead, C. and Donald, B. (2004) An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments. J. Biomol. NMR, 29, 111-138). Our approach uses NVR's scoring function and data types and also gives the option of using CH and NH residual dipolar coupling (RDCs), instead of NH RDCs which NVR requires. We test our technique on NVR's data set as well as on four new proteins. Our results are comparable to NVR's assignment accuracy on NVR's test set, but higher on novel proteins. Our approach allows partial assignments. It is also complete and can return the optimum as well as near-optimum assignments. Furthermore, it allows us to analyze the information content of each data type and is easily extendable to accept new forms of input data, such as additional RDCs.

Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.

The published articles in PLoS Computational Biology on the development of computational biology research in Mexico, Brazil, Cuba, Costa Rica, and Thailand have inspired us to report on the development of bioinformatics activities in Malaysia. Rapid progress in molecular biology research and biotechnology in Malaysia has created sufficient demand for bioinformatics in Malaysia. Although bioinformatics in Malaysia started in the early 1990s, the initial focus on the development of the biotechnology industry has curtailed the early gains and overshadowed the systematic development of bioinformatics in Malaysia, which currently lacks in human capital development, research, and commercialization. However, government initiatives have been devised to develop the necessary national bioinformatics network and human resource development programs and to provide the necessary infrastructure, connectivity, and resources for bioinformatics. Stakeholders are experiencing reorientation and consolidating existing strengths to align with the global trends in bioinformatics. This exercise is expected to reinvigorate the bioinformatics industry in Malaysia. Tapping into niche expertise and resources such as biodiversity and coupling it with the existing biotechnology infrastructure will help to create sustainable development momentum for the future. An initiative arose from several senior scientists across local universities in Malaysia to promote this new scientific discipline in the country.

Supported by US National Science Foundation (NSF) and the International Society of Intelligent Biological Medicine (ISIBM), the IEEE 7th International Conference on Bioinformatics and Bioengineering at Harvard Medical School was designed dynamically in response to the cutting edge synergistic research and education. One of the key components of this academic event is the poster presentation focusing on specific topics to foster collaboration between the computational biology and drug design domains. The Harvard meeting attracted over five hundred scientists, researchers and medical doctors world-wide to present, discuss and exchange their research. The synergies between computational biology and drug design research had been well observed by participants. The poster sessions had been designed to be responsive to the need for synergistic inter/multidisciplinary research and education. A panel of judges was formed to decide the best posters. The papers in this special issue were selected for runners-up of the best poster award by a panel of judges. Authors were then invited to expand their posters into full research papers. Submitted papers were required to contain significant additional scientific detail and were rigorously reviewed by at least three external reviewers. Detailed information regarding the academic event can be found at the White Paper of the IEEE 7th International Conference on Bioinformatics and Bioengineering at Harvard Medical School at BMC Genomics http://www.biomedcentral.com/1471-2164/9/S2/I1.

With a wealth of biodiversity, a long tradition of agriculture-based industries, and an established medical and biotechnological research and development community, Thailand has become an attractive location for life sciences investment. The large amount of data generated in many areas of life sciences requires visualization, management, and analysis, principally through bioinformatics. To become successful, Thailand's research community should emphasize establishing core technologies, such as genomics and bioinformatics, to boost development of agriculture, food processing, and biomedical research. The Thai government realized the importance of this field and created a national policy to greatly increase Thailand's participation in bioinformatics and genomics, budgeting for specific development goals in research infrastructure, education, and sustainable human resources. Thailand has not lagged behind in bioinformatics research activity and recognizes the importance of bioinformatics through increased policy awareness, human resources development, and increased research activity involving genomic-scale data generation and computational analyses. Many applications of genomics and bioinformatics to biomedical research and development in Thailand have progressed substantially during the past few years, leading to successful applications in some specific local areas. However, the applications to other important areas, such as agriculture, are hampered by the limited availability of genomic sequence data and the lack of necessary biochemical/physiological information. With the advent of more and more genomic information in public databases, Thailand's research community is striving to adopt comparative genomics to obtain information of direct relevance to the country's health and industrial needs. This article highlights Thailand's contribution to genomics and bioinformatics in the following areas: (1) policy support from the Thai government, (2) capacity building through infrastructure/education/human resources, and (3) research and development in genomics and computational biology. (See Box 1 for Authors' Biographies). Box 1. Authors' Biographies Wannipha Tongsima, M.S., obtained her master's degree in Industrial Microbiology from Chulalongkorn University, Thailand. She was involved in founding the Bioinformatics research program in BIOTEC. To reinforce the research activity in this area, she also helped organize the first International Conference on Computational Biology (InCoB), held in Bangkok in 2002. Later, she was appointed to manage one of the first BIOTEC ethnic-specific human genetic variation programs, named the Thailand SNP Discovery Project. She works as a Genomic Medicine program coordinator for the Cluster and Program Management Office (CPMO) of the National Science and Technology Development Agency (NSTDA), which is an umbrella organization of four other national research centers in Thailand, including BIOTEC. Sissades Tongsima, Ph.D., received his doctoral degree in Computer Science and Engineering from the University of Notre Dame, Indiana, United States. He has worked for the National Electronics and Computer Technology Center on High Performance Computing (HPC) and Computational Grid. During 2002–2004, he cochaired the Asia-Pacific Advanced Network (APAN) Grid Working Group. In 2003, he shifted his research direction from HPC architecture to bioinformatics research, when he started working for BIOTEC, and constructed the ThaiSNP database. His main research interest is in developing algorithms and databases for analyzing various research projects on human genetic variation. He currently heads the Genome Institute biostatistics and informatics laboratory at BIOTEC. Prasit Palittapongarnpim, M.D., earned his medical degree from Mahidol University, Thailand, and his B.S. in Mathematics from Ramkumhang University, Thailand. He is a Fellow of the Royal College of Pediatricians of Thailand and also an Associate Professor in Microbiology at Mahidol University, where he has conducted research focusing on tuberculosis. While holding a Deputy Director position, he initiated the Bioinformatics research program at BIOTEC in 2002 and led the organization of the first InCoB conference in 2002. He is currently a Vice President of NSTDA.

Genetically encoded probes based on Förster resonance energy transfer (FRET) enable us to decipher spatiotemporal information encoded in complex tissues such as the brain. Firstly, this review focuses on FRET probes wherein both the donor and acceptor are fluorescence proteins and are incorporated into a single molecule, i.e. unimolecular probes. Advantages of these probes lie in their easy loading into cells, the simple acquisition of FRET images, and the clear evaluation of data. Next, we introduce our recent study which encompasses FRET imaging and in silico simulation. In nerve growth factor-induced neurite outgrowth in PC12 cells, we found positive and negative signaling feedback loops. We propose that these feedback loops determine neurite-budding sites. We would like to emphasize that it is now time to accelerate crossover research in neuroscience, optics, and computational biology.

Innovation in computational biology research is predicated on the availability of published methods and computational resources. These resources facilitate the generation of new hypotheses and observations both on the part of the creators and the scientists who use them. These methods and resources include Web servers, databases, and software, both complex and simple, that implement a specific procedure or algorithm. Usually, a resource is maintained by the laboratory in which it was initially developed. We would assert that there is a growing level of frustration among scientists who attempt to use many of these resources and find that they no longer exist or are not properly maintained.

Given d>2 and a set of n grid points Q in ℜ d , we design a randomized algorithm that finds a w-wide separator, which is determined by a hyper-plane, in $O(n^{2\over d}\log n)$ sublinear time such that Q has at most $({d\over d+1}+o(1))n$ points on either side of the hyper-plane, and at most $c_{d}wn^{d-1\over d}$ points within $\frac{w}{2}$ distance to the hyper-plane, where c d is a constant for fixed d. In particular, c 3=1.209. To our best knowledge, this is the first sublinear time algorithm for finding geometric separators. Our 3D separator is applied to derive an algorithm for the protein side-chain packing problem, which improves and simplifies the previous algorithm of Xu (Research in computational molecular biology, 9th annual international conference, pp. 408–422, 2005).

In order to capture important subcellular dynamics, researchers in computational biology have begun to turn to mesoscopic models in which molecular interactions at the gene level behave as discrete stochastic events. While the trajectories of such models cannot be described with deterministic expressions, the probability distributions of these trajectories can be described by the set of linear ordinary differential equations known as the chemical master equation (CME). Until recently, it has been believed that the CME could only be solved analytically in the most trivial of problems, and the CME has been analyzed almost exclusively with kinetic Monte Carlo (KMC) algorithms. However, concepts from linear systems theory have enabled the finite state projection (FSP) approach and have significantly enhanced our ability to solve the CME without resorting to KMC simulations. In this paper, we review the FSP approach and introduce a variety of systems-theory-based modifications and enhancements to the FSP algorithm. Notions such as observability, controllability, and minimal realizations enable large reductions and increase efficiency with little to no loss in accuracy. Model reduction techniques based upon linear perturbation theory allow for the systematic projection of multiple time-scale dynamics onto a slowly varying manifold of much smaller dimension. We also present a powerful new reduction approach, in which we perform computations on a small subset of configuration grid points and then interpolate to find the distribution on the full set. The power of the FSP and its various reduction approaches is illustrated on few important models of genetic regulatory networks.

Computational Biology Research Research Articles

Related Topics

Articles published on Computational Biology Research

Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets

Preface: Satellite Workshop on Comparative Genomics, Research in Computational Molecular Biology (RECOMB-CG 2010)

Preface: 2nd Satellite Meeting on Bioinformatics Education, Research in Computational Molecular Biology (RECOMB-BE 2010)

Preface: 14th International Conference on Research in Computational Molecular Biology (RECOMB 2010)

Erratum: Corrigendum: The BioPAX community standard for pathway data sharing

QnAs with Terrence J. Sejnowski

Bioinformática en Colombia: presente y futuro de la investigación biocomputacional

Operations Research Models for Computational Biology, Bioinformatics and Medicine

Role of open chemical data in aiding drug discovery and design

SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale

NVR-BIP: Nuclear Vector Replacement using Binary Integer Programming for NMR Structure-Based Assignments.

The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*.

Bioinformatics in Malaysia: Hope, Initiative, Effort, Reality, and Challenges

Special RECOMB 2008 Issue

Promoting synergistic research and education in computational biology and drug design

Outlook on Thailand's Genomics and Computational Biology Research and Development

FRET imaging and in silico simulation: analysis of the signaling network of nerve growth factor-induced neuritogenesis

Computational Biology Resources Lack Persistence and Usability

Sublinear time width-bounded separators and their application to the protein side-chain packing problem

The Finite State Projection Approach for the Analysis of Stochastic Noise in Gene Networks

Lead the way for us