Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
Title: Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling Authors: Le, Thi Thu Huong Keywords: ADME modeling, Caco-2 cell permeability, Biopharmaceutics classification system, Support vector machine, Cost-sensitive learning, Resampling technique Issue Date: 2015 Publisher: Springer Abstract: In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for ADME modeling tasks is underexplored. In this paper, var-ious strategies including cost-sensitive learning and resam-plingmethodswere studied to tackle themoderate imbalance problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for data modeling. Support vector machine classifiers were con-struct