PredART: Uncertainty-quantified Machine Learning Prediction of Androgen Receptor Agonists Overcoming Imbalanced Dataset

Jang, Jidon; Na, Dokyun; Oh, Kwang-Seok

doi:10.2174/0115748936355551241220190451

상세 보기

PredART: Uncertainty-quantified Machine Learning Prediction of Androgen Receptor Agonists Overcoming Imbalanced Dataset

Jang, Jidon;
Na, Dokyun;
Oh, Kwang-Seok

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

Aim: This study aims to develop and validate a machine learning-based model for the accurate prediction of androgen receptor (AR) agonistic toxicity, addressing the challenges posed by data imbalance in existing predictive models. Background: Anomalous agonistic activity of the androgen receptor is a known major indicator of reproductive toxicity, which can lead to prostate cancer. Machine learning-based models have been developed for the rapid prediction of such agonists. However, the existing models have exhibited biased learning outcomes and low sensitivity due to the imbalance in the available training data. In the early screening process of drug discovery, low sensitivity caused by data imbalance can hinder the detection of potentially toxic compounds. Objective: The objective of this study is to develop a machine learning prediction model that classifies whether a drug candidate is an androgen receptor agonist or not with highly balanced performance compared to existing models. Methods: PredART is a bootstrap aggregated k-nearest neighbor model for the balanced prediction of androgen receptor agonistic toxicity using 381 active and 8,089 inactive datasets with structural features of them. Result: In this work, we propose an advanced model that combines the bootstrap aggregating algorithm with machine learning binary classifiers to identify androgen receptor-based reproductive toxicity while avoiding biased prediction results. The optimal model using k-nearest neighbor classifiers achieved an accuracy of 0.831, positive predictive value (PPV) of 0.882, sensitivity of 0.625, specificity of 0.951, Mathews correlation coefficient (MCC) of 0.633 on external test data, demonstrating a significant improvement in sensitivity compared to the previous study and achieving balanced learning. Furthermore, by calculating the standard deviation among outputs of the classifiers and employing this prediction uncertainty as a screening metric to select reliable predictions, the model's performance could be further enhanced. Conclusion: Based on the bootstrap aggregating algorithm, our prediction model effectively addressed data imbalance while evaluating the performance of various machine learning and deep learning classifiers for a benchmark. Additionally, by quantifying uncertainty, our model provided an intuitive assessment of prediction reliability during large-scale screening processes. © 2025 Bentham Science Publishers.

키워드

Androgen receptor; bootstrap aggregation; machine learning; reproductive toxicity; uncertainty quantification; CHEMICALS; CANCER; MODELS

제목: PredART: Uncertainty-quantified Machine Learning Prediction of Androgen Receptor Agonists Overcoming Imbalanced Dataset

저자: Jang, Jidon; Na, Dokyun; Oh, Kwang-Seok

DOI: 10.2174/0115748936355551241220190451

발행일: 2025

유형: Article

저널명: Current Bioinformatics

권: 20

호: 8

페이지: 751 ~ 759

ScholarWorks@중앙대학교

상세 보기

초록

키워드

파일 다운로드