Abstract

This user guide describes the rationale behind, and the modus operandi of a Unix script-driven package for evolutionary searching of optimal Support Vector Machine model parameters as computed by the libsvm package, leading to support vector machine models of maximal predictive power and robustness. Unlike common libsvm parameterizing engines, the current distribution includes the key choice of best-suited sets of attributes/descriptors, in addition to the classical libsvm operational parameters (kernel choice, kernel parameters, cost, and so forth), allowing a unified search in an enlarged problem space. It relies on an aggressive, repeated cross-validation scheme to ensure a rigorous assessment of model quality. Primarily designed for chemoinformatics applications, it also supports the inclusion of decoy instances, for which the explained property (bioactivity) is, strictly speaking, unknown but presumably “inactive”, thus additionally testing the robustness of a model to noise. The package was developed with parallel computing in mind, supporting execution on both multi-core workstations as well as compute cluster environments. It can be downloaded from http://infochim.u-strasbg.fr/spip.php?rubrique178.

Highlights

  • Support vector machines (SVMs), implemented for instance in the very popular libsvm toolkit [1], are a method of choice for machine learning of complex, non-linear patterns of dependence of an explained~ thought to be variable, here termed the “property” P, and a set of attributesdeterminants of the current P value of the instance/object they characterize

  • ~ and in particular the here we address the general class of machine learning problems P = P (D)

  • The approach co-opts an important decision making of the model building process, the choice of descriptors/attributes and their best preprocessing strategies, into the optimization procedure. This is important, because the employed descriptor space is a primordial determinant of modeling success and determines the optimal operational parameters of libsvm

Read more

Summary

Introduction

The fitting score is calculated, for completeness, as the “mean-minus-two-sigma” of fitted correlation coefficients and fitted balanced accuracy. This notwithstanding, their predicted D1 affinities came close to observed D5 values: a quite meaningful result confirming the extrapolative prediction abilities of the built model At this point, local SVM models and other temporary files are deleted, the chromosome associated to it attempt ID, fitting and fitness scores is being appended to done_so_far, and the attempt subfolder containing only setup information and results is moved from its temporary location on the cluster back to the working directory (with exception of the workstation-based implementation, when temporary and final attempt subdirectory location are identical).

Installation and Use
Prerequisites
Installation
Deployment-Specific Technical Parameters
User Guide
Preparing the Input Data Directory
Adding Decoys
Command-Line Launch of the libsvm Parameter Configurator
Defining the Parameter Phase Space
The Genetic Algorithm
Slave Processes
Reconstruction and Off-Package Use of Optimally Parameterized libsvm Models
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call