Jurgen Bajorath and Jens Auer Pages 80 - 90 ( 11 )
A method called “Emerging Chemical Patterns” (ECP) has recently been introduced as a novel approach to binary molecular classification (for example, “active” versus “inactive”). The underlying pattern recognition algorithm was first introduced in computer science and then adopted for applications in medicinal chemistry and compound screening. A special feature is its ability to accurately classify molecules on the basis of very small training sets containing only a few compounds. This feature is highly relevant for virtual compound screening when only very few experimental hits are available as templates. Here we adopt ECP calculations to simulate sequential screening using an experimental highthroughput screening (HTS) data set containing inhibitors of dihydrofolate reductase. In doing so, we focus on minimizing the number of database compounds that need to be evaluated in order to identify a substantial fraction of available hits. We demonstrate that iterative ECP calculations recover on average between ∼19% and ∼39% of available hits in the data set while dramatically reducing the number of compounds that need to be tested to between ∼0.002% and ∼9% of the screening database.
Pattern recognition, data mining, descriptors, molecular similarity, molecular classification, structure-activity relationships, virtual screening, iterative screening
Department of Life Science Informatics,B-IT, Rheinische Friedrich-Wilhelms-Universitat, Dahlmannstr. 2, D-53113 Bonn, Germany.