PredART: Uncertainty-quantified Machine Learning Prediction of Androgen Receptor Agonists Overcoming Imbalanced Dataset
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Aim: This study aims to develop and validate a machine learning-based model for the accurate prediction of androgen receptor (AR) agonistic toxicity, addressing the challenges posed by data imbalance in existing predictive models. Background: Anomalous agonistic activity of the androgen receptor is a known major indicator of reproductive toxicity, which can lead to prostate cancer. Machine learning-based models have been developed for the rapid prediction of such agonists. However, the existing models have exhibited biased learning outcomes and low sensitivity due to the imbalance in the available training data. In the early screening process of drug discovery, low sensitivity caused by data imbalance can hinder the detection of potentially toxic compounds. Objective: The objective of this study is to develop a machine learning prediction model that classifies whether a drug candidate is an androgen receptor agonist or not with highly balanced performance compared to existing models. Methods: PredART is a bootstrap aggregated k-nearest neighbor model for the balanced prediction of androgen receptor agonistic toxicity using 381 active and 8,089 inactive datasets with structural features of them. Result: In this work, we propose an advanced model that combines the bootstrap aggregating algorithm with machine learning binary classifiers to identify androgen receptor-based reproductive toxicity while avoiding biased prediction results. The optimal model using k-nearest neighbor classifiers achieved an accuracy of 0.831, positive predictive value (PPV) of 0.882, sensitivity of 0.625, specificity of 0.951, Mathews correlation coefficient (MCC) of 0.633 on external test data, demonstrating a significant improvement in sensitivity compared to the previous study and achieving balanced learning. Furthermore, by calculating the standard deviation among outputs of the classifiers and employing this prediction uncertainty as a screening metric to select reliable predictions, the model's performance could be further enhanced. Conclusion: Based on the bootstrap aggregating algorithm, our prediction model effectively addressed data imbalance while evaluating the performance of various machine learning and deep learning classifiers for a benchmark. Additionally, by quantifying uncertainty, our model provided an intuitive assessment of prediction reliability during large-scale screening processes. © 2025 Bentham Science Publishers.

키워드

Androgen receptorbootstrap aggregationmachine learningreproductive toxicityuncertainty quantificationCHEMICALSCANCERMODELS
제목
PredART: Uncertainty-quantified Machine Learning Prediction of Androgen Receptor Agonists Overcoming Imbalanced Dataset
저자
Jang, JidonNa, DokyunOh, Kwang-Seok
DOI
10.2174/0115748936355551241220190451
발행일
2025
유형
Article
저널명
Current Bioinformatics
20
8
페이지
751 ~ 759

파일 다운로드