Prediction of aqueous solubility of drug-like molecules using a novel algorithm for automatic adjustment of relative importance of descriptors implemented in counter-propagation artificial neural networks
In this work, we present a novel approach for the development of models for prediction of aqueous solubility, based on the implementation of an algorithm for the automatic adjustment of descriptor's relative importance (AARI) in counter-propagation artificial neural networks (CPANN). Using this approach, the interpretability of the models based on artificial neural networks, which are traditionally considered as "black box" models, was significantly improved. For the development of the model, a data set consisting of 374 diverse drug-like molecules, divided into training (n=280) and test (n=94) sets using self-organizing maps, was used. Heuristic method was applied in preselecting a small number of the most significant descriptors to serve as inputs for CPANN training. The performances of the final model based on 7 descriptors for prediction of solubility were satisfactory for both training (RMSEP(train)=0.668) and test set (RMSEP(test)=0.679). The model was found to be a highly interpretable in terms of solubility, as well as rationalizing structural features that could have an impact on the solubility of the compounds investigated. Therefore, the proposed approach can significantly enhance model usability by giving guidance for structural modifications of compounds with the aim of improving solubility in the early phase of drug discovery.