Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling

Title: Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
Authors: Le, Thi Thu Huong
Keywords: ADME modeling, Caco-2 cell permeability, Biopharmaceutics classification system, Support vector machine, Cost-sensitive learning, Resampling technique
Issue Date: 2015
Publisher: Springer
Abstract: In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for ADME modeling tasks is underexplored. In this paper, var-ious strategies including cost-sensitive learning and resam-plingmethodswere studied to tackle themoderate imbalance problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for data modeling. Support vector machine classifiers were con-structed and compared using multiple comparison tests. Results showed that the models developed on the basis of resampling strategies displayed better performance than the cost-sensitive classification models, especially in the case of oversampling data wheremisclassification rates for minority class have values of 0.11 and 0.14 for training and test set, respectively. Aconsensusmodel with enhanced applicability domain was subsequently constructed and showed improved performance. This model was used to predict a set of ran-domly selected high-permeability reference drugs according to the biopharmaceutics classification system. Overall, this study provides a comparison of numerous rebalancing strate-gies and displays the effectiveness of oversampling methods to deal with imbalanced permeability data problems
URI: http://repository.vnu.edu.vn/handle/VNU_123/11505
ISSN: 1381-1991
Appears in Collections:SMP - Papers / Tham luận HN-HT

Nhận xét

Bài đăng phổ biến từ blog này

Di tích kiến trúc Hội An trong tiến trình lịch sử : Luận án TS. Lịch sử: 62 22 54 01

Hoàn thiện mô hình dự án liên kết giữa nhà trường và doanh nghiệp nhằm nâng cao chất lượng đào tạo nhân lực công nghệ cho doanh nghiệp (Nghiên cứu trường hợp Trường Đại học FPT và Công ty TNHH Phần mềm FPT)

Phát hiện gián tiếp đột biến gen EGFR trong ung thư biểu mô tuyến của phổi bằng kỹ thuật hóa mô miễn dịch: Luận văn ThS. Công nghệ nano sinh học (chuyên ngành đào tạo thí điểm)